Salesforce

Closing a Windows CMD shell running a hung DSRUTIL process causes an abnormal database shutdown

« Go Back

Information

 
TitleClosing a Windows CMD shell running a hung DSRUTIL process causes an abnormal database shutdown
URL Name000040624
Article Number000161427
EnvironmentProduct: OpenEdge
Version: 10.x, 11.1
OS: Windows
Other: OpenEdge Replication
Question/Problem Description
On Windows, when the RPLS terminated with a "DSRUTIL -C terminate server" command hangs due to the pica queue being full, the source database shuts down.

The source database is abnormally shutdown when the hung Windows console running DSRUTIL -C terminate server is exited using the X in the corner of the window, a CTRL_CLOSE_EVENT is sent to the process.
Steps to Reproduce1. Configure a source and target replication enabled environment.

2. Start the source and target databases with a Lock Table value to accommodate a large transaction (-L 1000000)
The Target database must have -bithold 4 -bistall, to force the situation.

2. Run a large transaction on the source side that will cause the target bi to stall and then subsequently fill up the source pica queue.

3. On the source side, terminate the RPLS
$ dsrutil <sourcedb> -C terminate server

This will produce the following message and will then return to the PROENV prompt:
The Replication Server has been instructed to shutdown.

The RPLS has been instructed to stop, but will not terminate because the rpserver is blocked waiting for a free slot in the RPLS-Q

4. If you exit the command prompt at this point the CTRL_CLOSE_EVENT will not occur. The RPLS will still not stop, but the database will not crash either.

5. From another PROENV cmd prompt, terminate the RPLS and it will hang. It won't produce any message and it won't return to the PROENV prompt.
$ dsrutil <sourcedb> -C terminate server.

6. Terminate the command prompt running DSRUTIL and this time it will cause the CTRL_CLOSE_EVENT which results in the source database throwing an ABNORMAL shutdown.
Clarifying Information
The rprepl.exe process hangs (the DSRUTIL utility).

A hung dsrutil -C monitor session that is closed when the RPLS-Q is full does not cause the source database to crash

Terminating the RPLS hangs when the RPLS-Q is full regardless of whether the database is managed by the AdminServer or started with PROSERVE
Error Message(9436) CTRL_CLOSE_EVENT console event received.
(2249) Begin ABNORMAL shutdown code 2
Defect NumberDefect PSC00258804
Enhancement Number
Cause
The issue of  terminating the RPLS with "dsrutil dbname -C terminate server" not causing the replication server to shutdown when the pica queue is full is a known issue (PSC00217061) documented in Article  Cannot terminate replication server process when pica buffer is full.  

This issue is a further example of the failure, when the RPLS is instructed to terminate a second time and hangs, the source database abnormally shuts down as a consequence of the console session running the hung process being terminated sending a CTRL_CLOSE_EVENT to the rpserver.exe process.  The first instruction to terminate the RPLS is expected to complete. When it does not complete, the second instruction is hung because it is queued up behind the first instruction. Terminating the Windows console results in the first instruction ending abnormally as it is holding a critical latch in shared memory for the RPLS process. It is this latch which is held when the process dies that causes the database to throw an ABNORMAL shutdown.
 
Resolution
1.  Only when the pica queue is full, instead of using DSRUTIL to end the Replication Server process use the Task Manager to kill the rpserver.exe process (RPLS ) instead.  To determine if the RPLS-Q is full before terminating the RPLS process, refer to Article  How to monitor the  message replication queue set with the -pica parameter.  

2.  Upgrade to OpenEdge 10.2B08, 11.2, 11.3 or later, where the maximum value for -pica has been increased to 1000000. 
It is not recommended to use the maximum value as this will cause longer synchronization times at startup or re-connection time after failure. Instead, revisit calculating the optimum -pica value for the environment's high write activity periods. Using a higher -pica value reduces the likelihood of the pica queue becoming full and therefore avoiding this issue of not being able to terminate the RPLS process and otherwise the source database OLTP activity stalling.
Workaround
Notes
Keyword Phrase
Last Modified Date11/20/2020 7:11 AM

Powered by