Salesforce

Troubleshooting why PROSHUT does not remove the userid from the PROMON User Control Table

« Go Back

Information

 
TitleTroubleshooting why PROSHUT does not remove the userid from the PROMON User Control Table
URL NameP69391
Article Number000141997
EnvironmentProduct: Progress
Version: 9.x
Product: OpenEdge
Version: All Supported Versions
OS: All Supported Platforms
Question/Problem Description
Troubleshooting why a client session hangs after being disconnected with PROMON or PROSHUT
Troubleshooting why after being disconnected, the process's UserID is not removed from the shared-memory 
Troubleshooting why a user is still showing in the PROMON User Control Table after being disconnected
  • User is not shown in PROSHUT User List: proshut <dbname> -C list
  • User is still seen in the PROMON User Control list
PROSHUT -C DISCONNECT was used to disconnect an active client session.
Creating, Updating or Deleting a record causes the client to hang and cannot be disconnected from the database.
Client process disconnects from the database but is not removed from the database shared-memory.
The client process ( _progres, prowin, _proapsv) process is hung and cannot be terminated
Client process is still running in the UNIX process list (ps -ef) or Windows tasklist
The Client process PID is still found in the UNIX process list (ps -ef) or Windows Task manager (tasklist)
The parent PID of the client process may be 1 or Init
Steps to Reproduce
Clarifying Information
Error Message
Defect NumberEnahncement PSC00142267 / OE00092240
Enhancement Number
Cause
There are many reasons why a process may not disconnect properly and for this reason the best approach is to gather as much information as possible (specifically multiple a process stacktraces) prior to killing the process at the operating system level (with caution).  It is virtually impossible to determine the cause of the hung process without this information.

Additional signal logging was added Progress 9.1D09, 9.1E, OpenEdge 10.1A to assist development with determining the state a hung process is in. This logging assists in finding what signals are being sent to a process and how the process is reacts to those signals. For example:
  • When a remote client is disconnected, the Lines 2-4 are additional debug messages
  • When a local client is disconnected, the Lines 2 and 4 are additional debug messages                
$   proshut dbname -C disconnect 24

1. SHUT    6: (-----) User 24 disconnect initiated
2. BROKER  0: (-----) Notifying srvr 1, conn 1, to terminate remote user 24
3. BROKER  0: (-----) Sending signal <n> to pid <pid> for user 24
4. SRV     1: (1166)  Server disconnecting user 24.
5. SRV     1: (739)   Logout usernum 24, userid <>, on <>
 
 
Resolution
Troubleshooting why after being disconnected, the UserID is not removed from the shared-memory:

When a hung process does not disconnect, gather the following information prior to killing the process (with caution) then log a support call with Progress Technical Support.
  1. Gather PROMON latch information:
  • The UserID that was initially used to disconnect the client with PROSHUT or PROMON is important. 
  • If that userid is shown in the "Owner" column for any of the latches under the "Latch Counts" output, then killing that user will cause the database to shut down (by design). Choose "U", to update the latch screen again to confirm whether it's holding the latch for longer than a millisecond (you may have just caught a user holding a latch at that millisecond) then re-confirm the latch has not be acquired again. Unless the promon Display page length has been adjusted, there are two pages of latches to check, press <Enter> to view the second page.     
$   promon -NL <dbname> | tee promon.txt" 
1.  User Control -> 1.  Display all entries -> q
4. Record Locking Table -> 1. Display all entries -> q
-> R&D -> 1. Status Displays -> 4. Processes/Clients -> 2. Blocked Clients -> p (previous menu)
[-> R&D -> 1. Status Displays -> 4. Processes/Clients ] -> 3. Active Transactions -> t (top menu)
[-> R&D -> ] 1. Status Displays -> 6. Lock Table -> 1. Display all lock entries -> t (top menu)
[-> R&D -> ] debghb -> 6 (hidden option) -> 5. Locked Buffers -> p
[-> R&D -> debghb -> 6 (hidden option) ] -> 6. Buffer Locks -> p
[-> R&D -> debghb -> 6 (hidden option) ] -> 15. Buffer Lock Queue -> p
[-> R&D -> debghb -> 6 (hidden option) ] -> 8. Resource Queues -> p
[-> R&D -> debghb -> 6 (hidden option) ] -> 16.  Semaphores -> p
[-> R&D -> debghb -> 6 (hidden option) ] -> 9. TXE Lock Activity -> p
[-> R&D -> debghb -> 6 (hidden option) ] -> 11. Latch Counts -> p
x
  1. Find the process ID (PID) 
Find the process ID (PID) of the hung process from the PROMON User Control screen:
This information has already been gathered above, otherwise:
$   promon -NL <dbname> 
> 1.  User Control.  > 2.  Match a User Number  >  <Enter the Hung User's number UserID >. 
  1. Use proGetStack to determine what the process is doing

Run proGetStack <PID> on the machine where the user is connecting from will generate a protrace..<PID> file for the process associated with the executable. This may require root/Administrator permissions. If this process is no longer around an error will indicate this fact.  If the process is still running, execute it again a couple more times, about 5-10 seconds apart which will generate a new protrace into the same file, appending to the bottom each new time you call proGetStack.

Review the output for ABL Stack Trace, looking at the top line, to see if it changes at all.

** ABL Stack Trace **

--> foo.p at line 25369  (foo.r)
    bar.p at line 4367 (bar.r)
Compile the program referenced in the top line of the ABL Stack, with the DEBUG-LIST option to find out what is happening at the line referenced (25369 in this example).
COMPILE foo.p DEBUG-LIST foo.dbg.
  1. Find what resources the process is currently using
Verify if this process is using CPU resources. Substitute "<PID>" for the PID number obtained using Step 2 above.           
UNIX: "ps -ef | grep <PID>".  
WINDOWS: "tasklist -v | FINDSTR <PID>
Attempting to disconnect a process that it appears to be a rogue process consuming a lot of CPU may be difficult if that process is holding a latch and somehow gets into a loop, this may account for the inability to stop the process.
If the client process has a Parent Process ID of 1 this means that it is bound to the INIT process and the only resolution is to shutdown the database. Refer to Article:
  1. Get at least 3 Stack traces
Try to get a stacktrace from the hung process. These are a couple of the more basic methods of doing this and the commands available will depend on the Operating System:
  • (most UNIX's): kill -16 PID > pid.txt
  • (Solaris): pstack PID > pid.txt
  • (AIX): procstack PID > pid.txt
  • Use a different debugger like gstack to generate the C+ stack
Since OpenEdge 10.1C: a protrace.<pid> including the ABL stack if available.
  • UNIX: kill -SIGUSR1 PID 
  • WINDOWS: %DLC%\bin\_debugConfig.exe" -getstack PID 
  • on Windows the same version of %DLC%\pdbfiles as the current installed version + service pack/update + hotfix must be present in order to unwind the stack properly) Refer to Article: What is the purpose of .PDB files?  
     
  1. If a stacktrace is produced:
Run the command again at least 3 times to different files (or append to current file), so that the process state can be determined as static or changing between stacktrace snapshots.
  1. If a stacktrace is not been produced or hangs :
On UNIX: In the absence of stack information, generate a core file by using "kill -8 pid" as a last resort to gather information. As long as ulimit -c is not disabled and there are no suid or hardlimits preventing core file generation, a core file also provides information about the state of local and global variables in addition to the C-stack. 

On Windows use Microsoft Process Explorer. (https://technet.microsoft.com/en-gb/sysinternals/bb896653.aspx)
  1. Identify if the hung process is still attached to shared memory or a database file descriptor
     On UNIX:
$   lsof -p <pid> 
Review the list of file descriptors that the process is attached to. If any of the files listed is a database file, killing the process once information collection is complete, may well cause the database to crash. Start by running "kill -15 <pid>" (SIGHUP) to kindly ask the process to terminate, otherwise more aggressive kill options might be needed:
On Windows, one way to identify the process shared-memory or file handles is the Microsoft Process Explorer referred above:
View > Show Lower Pane [CTRL + L]
View > Lower Pane View > Handles [CTRL + H ]
  1. From the 'client-side' perspective: 
Find out why the client session needed to be disconnected in the first place. This should not be a 'normal' action especially when that client was doing work actions at the time.  This information is not always easy to obtain, but the effort in doing so has proven to be the quintessential key information in resolving that this situation does not repeat.
  • What was the last thing the user remembers doing in their application workflow? 
  • What was the application session doing which caused the user to terminate their session or caused their session to hang?
  • Exactly how was their session terminated? The subsequent termination actions tried by the user at the time helps to further diagnose corrective actions:
    • Did the user simply turn off their machine or exit their telnet session without first exiting their application session?
    • That untrappable kill signals were involved or perhaps there was something else that unexpectedly terminated their session for them (timeouts on the network / firewall / terminal)
  1. Retain the database log file.  
Inclusive of the database startup prior to the issue all the way until after the database shutdown (if applicable).
  1. Finally terminate the process, or shut down the database.
If there were any latches held by this user, then the database will crash when this user dies. This will not corrupt the database as long as there are not other corruption at play which will fail the BI redo and undo processing when the database is next accessed.  It will also mean that all currently running users will be kicked off and the database will shut down. Therefore first consider gracefully disconnecting current users first.  An ABL script example is provided in Article:
The longer the database is left running, the longer bi recovery time will take. For OpenEdge 11.6.1 or later databases when the database is next started, first truncate the bi file with the -crStatus and -crTXDisplay parameters before re-starting multi-user. For further information refer to Article:
Workaround
Notes
Keyword Phrase
Last Modified Date9/16/2021 1:52 PM

Powered by