Pend. Users at Promon, r&d, 1 (status displays), 3 (servers) - Forum - OpenEdge RDBMS - Progress Community

Pend. Users at Promon, r&d, 1 (status displays), 3 (servers)

 Forum

Pend. Users at Promon, r&d, 1 (status displays), 3 (servers)

This question is not answered

What causes values to show in pend. in servers status of promon?

The number of users in pending add up the number of open user connections that I expect to have. At that point users are getting error 748

The server or the system has no more resources. Please contact Progress Technical Support. (748)

Maybe related, maybe not...  In the same time frame users are reporting error 9407 where the host name is the name of the database server and the port  is 2000 TCP.

Connection failure for host <host_name> port <port> transport <transport_name>. (9407)

What does this make you think of as the root cause? What should I look into next? 

Mark 

OS: Windows server 2012

Progress OE11.5.1  

All Replies
  • Are you specifying -minport/-maxport on your connection brokers?  You should have unique, non-overlapping port ranges for each one.  It makes firewall configuration and troubleshooting of issues like this one much more straightforward.

    If a client connects to the broker (-S port) but can't connect to the assigned server port (between -minport and -maxport inclusive) then you will have a pending connection.  Check that the firewall(s) between clients and server allow connections to the server on all required broker and server ports.

  • Also check promon to ensure that all servers that should be able to spawn are able to.  Check the DB log for errors related to spawning servers.  Note that the broker won't spawn a 4GL server on a port if it is defined in the OS services file, even if it isn't currently in use.  Check that there aren't defined services with port numbers within your minport/maxport ranges.

  • Thank you Rob for the input. In promon I find servers for the number of servers I expect. All of the servers have Cur. users values that account for the connected users.

    It looks like the number of pending users adds up to what I expect are open slots. So new connections get the no more resources.

    I need to look into the min/max port, service file, and run a netstat to see what is in use.

  • Also check that -n is high enough given your anticipated connection count.

  • > What causes values to show in pend. in servers status of promon?

    Login broker sents a port of the started server to a new client. Then it checks if the number of the clients served by the server is increased. In other words the login broker checks if the client successfully connected to the server. Otherwise it's counted as a pending client connection.

    So it's always a port inside the -minport/-maxport that was used to start a remote client server. Most likely there is a problem with the port on OS level.

    Can you post the output from promon?
    Example:

    Sv                                                   Pend.   Cur.   Max.   Port
    No    Pid  Type       Protocol               Logins  Users  Users  Users    Num
    
    361 50988512 Auto       TCP                       339      0      2     30  10375
    362 15599228 Auto       TCP                       460      0      2     30  10376
    363 66258402 Auto       TCP                         0      2      0     30  10405
    364 16449868 Auto       TCP                       306      0      2     30  10378
    365 42468244 Auto       TCP                       362      0      2     30  10379
    

    Srv 363 did not have the successful logins. It was a server newly started on port 10405 and the clients were unable to connect to this port.

  • Connections can be made on each of the ports and then connections seem to get stuck in pending.  

    11/21/17        Status: Servers  07:37:58

    Sv                                                                    Pend.   Cur.   Max.   Port

    No    Pid  Type       Protocol               Logins  Users  Users  Users    Num

     0   5484 Login      TCP                         0      0      0      2   2021

     1  16956 Auto       TCP                         1      0      1      2   4002

     2   7256 Login      TCP                     20312      0      0      6   2020

     3   8132 Auto       TCP                      2088      4      2      6   4100

     4   6276 Auto       TCP                      3762      4      2      6   4101

     5   6116 Auto       TCP                      2127      4      2      6   4102

     6  18496 Auto       TCP                       790      0      1      2   4006

     7   4848 Auto       TCP                      6070      2      4      6   4103

     8   8424 Auto       TCP                      2061      3      3      6   4104

     9   7772 Auto       TCP                      2067      5      1      6   4105

    10   7472 Auto       TCP                      2112      3      3      6   4106

    11  13372 Auto       TCP                       105      0      1      2   4003

    12  14000 Auto       TCP                       600      0      1      2   4004

    13  17196 Auto       TCP                       461      0      0      2   4005

  • Too busy servers may result in the pending connections.

    What about db performance?

    R&D/Activity/Summary/Commits per sec and Active trans?

    Activity/Performance Indicators/Latch timeouts per sec?

  • The parameter PenndConnTime is set?

  • R&D/Activity/Summary/Commits per sec and Active trans?

    11/21/17        Activity: Summary

    10:05:50        08/08/17 19:41 to 11/21/17 10:05 (2511 hrs 23 min)

    Event                  Total  Per Sec |Event                  Total  Per Sec

    Commits               98954K     11.2 |DB Reads            1414600K    160.2

    Undos                  6138       0.0 |DB Writes          18149892       2.0

    Record Reads         346135M  40144.7 |BI Reads             701766       0.1

    Record Updates       104761K     11.9 |BI Writes           5625586       0.6

    Record Creates      5450812       0.6 |AI Writes           5345318       0.6

    Record Deletes      2844443       0.3 |Checkpoints           76869       0.0

    Record Locks        1929763K    218.6 |Flushed at chkpt   16441621       1.8

    Record Waits            125       0.0 |Active trans              0

    Rec Lock Waits     0 %    BI Buf Waits      1 %    AI Buf Waits      1 %

    Writes by APW      0 %    Writes by BIW     0 %    Writes by AIW     0 %

    DB Size:          14 GB   BI Size:       1263 MB   AI Size:        376 K

    Empty blocks:  91305      Free blocks:  41426      RM chain:      2814

    Buffer Hits       99 %    Primary Hits     99 %    Alternate Hits    0 %

    8 Servers, 24 Users (5 Local, 19 Remote, 9 Batch), 0 Apws

  • 11/21/17        Activity: Performance Indicators

    10:15:28        08/08/17 19:41 to 11/21/17 10:15 (2511 hrs 33 min)

                                       Total         Per Min          Per Sec

       Per Tx

    Commits                             98955K            672            11.21

         1.00

    Undos                                6138               0             0.00

         0.00

    Index operations                   271070M        1886198         31436.64

      2805.08

    Record operations                  346274M        2409489         40158.15

      3583.30

    Total o/s i/o                     1444471K           9816           163.59

        14.60

    Total o/s reads                   1416032K           9622           160.37

        14.31

    Total o/s writes                 29121304             193             3.22

         0.29

    Background o/s writes             2121117              14             0.23

         0.02

    Partial log writes                 856362               6             0.09

         0.01

    Database extends                   118336               1             0.01

         0.00

    Total waits                         23008               0             0.00

         0.00

    Lock waits                            125               0             0.00

         0.00

    Resource waits                      22883               0             0.00

         0.00

    Latch timeouts                     262567K           1784            29.74

         2.65

    Buffer pool hit rate:  99 %     Primary pool hit rate:  99 %     Alternate pool

    hit rate:   0 %

  • > 10:05:50        08/08/17 19:41 to 11/21/17 10:05 (2511 hrs 23 min)

    Interval is longer than 100 days. It can't help to identify the current bottleneck.

  • Is there a way to refresh the counters without a DB stop and start?

  • Enter <return>, A, L, R, S, U, Z, P, T, or X (? for help): ?
    
    Entry    Action taken
    
    A        Activate auto repeat mode. The current display then restarts
             after the number of seconds specified by the current display
             pause time. Stop after the number of times specified by the
             auto repeat count or when you press control-c.
    S        Sample activity counters for the number of seconds specified
             by the sampling interval. The current display then restarts.
             The data shown include only activity which occurred during the
             sample interval.
    U        Update activity counters. The current display then restarts.
             The data shown include changes which occurred since the initial
             set of data was collected.
    Z        Zero the activity counters so updates show changes from now.

    I prefer: Z, wait a few seconds, U

  • ♀11/21/17        Activity: Performance Indicators

    10:40:55        11/21/17 10:31 to 11/21/17 10:31 (9 sec)

                                       Total         Per Min          Per Sec

       Per Tx

    Commits                                 0               0             0.00

         0.00

    Undos                                   0               0             0.00

         0.00

    Index operations                   953316         6355440        105924.00

         0.00

    Record operations                  958636         6390907        106515.11

         0.00

    Total o/s i/o                         497            3313            55.22

         0.00

    Total o/s reads                       497            3313            55.22

         0.00

    Total o/s writes                        0               0             0.00

         0.00

    Background o/s writes                   0               0             0.00

         0.00

    Partial log writes                      0               0             0.00

         0.00

    Database extends                        0               0             0.00

         0.00

    Total waits                             0               0             0.00

         0.00

    Lock waits                              0               0             0.00

         0.00

    Resource waits                          0               0             0.00

         0.00

    Latch timeouts                        800            5333            88.89

         0.00

    Buffer pool hit rate:  99 %     Primary pool hit rate:  99 %     Alternate pool

    hit rate:   0 %

  • Latch timeouts is 88.89 per sec. I suspect it's mainly MTX naps.

    Check promon/R&D/debghb/6/11. Latch Counts

    Update: no commits/undoes per 9 seconds! It's unlikely that MTX latch was used at all. What latches have non-zero naps?