Salesforce

USR latch remains after user logs out preventing shutdown

« Go Back

Information

 
TitleUSR latch remains after user logs out preventing shutdown
URL Nameusr-latch-remains-after-user-logs-out-preventing-shutdown-000108261
Article Number000154587
EnvironmentProduct: OpenEdge
Version: 10.2B03 and later, 11.x to 11.7.5, 12.0 to 12.2
OS: All supported platforms
Question/Problem Description

An orphan USR latch remains in the database after the user's log out message 453
A shared-memory client leaves a USR latch in shared memory after the user logs out preventing shutdown
Database becomes unstable preventing all the other users (local and remote) from being able to connect or properly disconnect

A client logging out from a database leaves a USR latch being held without being cleaned prior logging out with no supporting error messages.

11:00:10.168+0600] P-115869  Usr 3338: (453)  Logout by <name> on batch.

promon: Activity: Latch Counts shows the USR latch being held by the user that logged out

                             ----- Locks -----
Time       Owner              Total   /Sec
11:04:35 USR 3338       27487     91

11:09:38 USR 3338           0          0

After logging out the process (PID 115869) of user 3338 (latch owner) does not exist on the OS level or in the Usrctl Table.


USR latch stays locked until the database is eventually shutdown

  • PROSHUT hangs when USR Latch was locked
  • PROSHUT -byF was unable to stop the database when the USR latch was locked
Steps to Reproduce
Clarifying Information
All the other users (local and remote) are unable to connect or properly disconnect when the orphan USR latch remains
LOGOUT 453 message does not mean that process is already disconnected from shared memory, this happens immediately after the message
This situation has not re-occurred in over 3 months unlike a similar recurring problem that was previously encountered in 10.2B
Error Message
Defect NumberOCTA-19414
Enhancement Number
Cause
The 10.2B03 fixes: (OE00200486 / PSC00227240) closed most of the windows that latches remain, such that:
  • If any latch is held before exiting, it will be released before exiting.
  • There remains a very tiny window, because this window is different: 
If a process is killed while holding latch, watch dog usually comes in and do cleaning up:
  • The watchdog periodically scans Usrctl Table to find the sessions that don't exist on OS level. If it finds such session then watchdog gets the identity of the dead user. At this moment watchdog locks the USR latch to release the resources locked by the dead user. However, in this window it is not possible for the watchdog feature to lock the USR latch when it the latch is still held by the dead user. 
  • This window happens after "log out " message. The behavior is almost identical to the log file evidence and code review.
  • For this window, the user control is cleaned already. As a consequence the watchdog does not respond and there is no message in the log file indicating this user is killed (session terminated). This window can only happen if this user is killed at this very spot ( no message and no watchdog response).
  • This tiny window essentially is caused by kill or similar untrappable termination, but "looks like a normal exit".
  • At this stage nobody can clean it up as the user control is cleaned already. While the watchdog, broker, _mprshut can access this user control, they would need one user latch but cannot get fast lock for either the user latch acquire and release operation and go into "sleep stage". 
  • This window is not abnormal. It is there for every user logging out. And it is okay. The only time, it might become an issue is that the process is killed and this window becomes permanent.
  • This is not a regression, it is a timing issue, _mprshut -F was never able to shut down the database gracefully in this situation in the past
Resolution
Fixed version(s): 11.7.6.0, 12.2.1.0, 11.7.5.024, 11.7.5.025, OpenEdge 12.3
Workaround
When the USR latch is held the database broker won't be able to stop. Attempt to terminate all remaining client sessions from the operating system level or reboot the machine.
Notes
In the fixed versoins, PROSHUT -F has been improved to assure shutdown by being allowed to bypass getting a user latch. However, proshut -by is unchanged.

Additional messages have been added to the code and will be reported in the database log to confirm if known situations has occurred when a forced shutdown is required.

Example PROSHUT -F additional logging:<br>
// The issue is caught before _mprshut -F takes the user control slot (5). This user "5" is not _mprshut -F itself.

SHUT 5: (16613) Warning: The Force Access (-F) Options has been specified.
// Diagnosis message is this second message:
SHUT 5: (-----) WARNING: User 5 is not in use, but holds Usr latch.
SHUT 5: (5316) Emergency shutdown initiated...

promon -F also skips the USR latch at logout
Additional checks at login and logout prevent the likelihood of this issue from happening again

If the the USR latch is left in an acquired state by a user# that is not connected to the database happens again in a fixed version, additional diagnostic information will be written to the lg file. 
Please open a case and provide this information to technical support for further diagnostic research.



Progress Articles:

When a PROMON -F session is disconnected the database sometimes becomes connectionless and won't shutdown  
Client process (batch or interactive) crashing after being blocked on record lock when dbnotification is enabled
Clients connected to dead remote server process cannot be disconnected   
Troubleshooting why PROSHUT does not remove the userid from the PROMON User Control Table  
Keyword Phrase
Last Modified Date12/4/2023 6:23 PM

Powered by