Salesforce

What procedure to use when RPLS is stuck in status 1100 "RP STATE CONNECTING"?

« Go Back

Information

 
TitleWhat procedure to use when RPLS is stuck in status 1100 "RP STATE CONNECTING"?
URL Name000056141
Article Number000174388
EnvironmentProduct: OpenEdge Replication
Version: 10.x, 11.x
OS: All supported platforms
Question/Problem Description
What is the procedure to use when RPLS is stuck in status 1100 "RP STATE CONNECTING"?

When the RPLS remains CONNECTING can the Replication Server process be KILL terminated while the database is in use? 

If the Replication Server is killed and restated using DSRUTIL -C restart server
will replication catch up or could the target possibly be corrupt as a result?

What to do when the RPLS is stuck in "CONNECTING" status after network failure
 
Steps to Reproduce
Clarifying Information
Error Message
Defect Number
Enhancement Number
Cause
Resolution
Status 1100 "RP STATE CONNECTING" 

This status means that the RPLS is still trying to establish contact with the RPLA within the connect-timeout period and after that the defer-agent time. Essentially one needs to find out why the RPLS cannot establish connection with the RPLA at that time, remembering that there are 5 minute intervals between each attempt.

The quick and easy is to restart the source database whenever network failure conditions pre-ceed the CONNECTING status from clearing.
Assuming that the agent-shutdown-action=recovery property is set in the replication.properties file, then the RPLA will still be listening and a new RPLS will start with the source database.

It is never-the-less a good idea to first verify this is the state, with DSRUTIL <target> -C monitor
Otherwise the target database needs to be started to restart the RPLA prior to OpenEdge 11.6,
Since OpenEdge 11.6 the agent can be restarted online with: DSRUTIL <target> -C restart agent

Since DSRUTIL status 1100 means: "RP STATE CONNECTING"

1.  First confirm that the RPLA is running on target(s):
$   DSRUTIL <target> -C monitor

2.  Use Network utilities to see if there's anything preventing communications to the Target -S listening port.

3.  The source RPLS cannot be terminated before the initial connect-timeout has expired before it goes to defer-agent timeouts
            
The connect-timeout specifies how many seconds the Replication Server will wait for connection to the Replication Agent before the Replication Server enters defer-agent timeouts or simply terminates if defer-agent is not in use.

If in defer-agent time, first try to force a connection to the Replication Agent: 
$   dsrutil <source> -C startAgent
              
Otherwise, the RPLS can be terminated during defer-agent-startup time, after first checking that the RPLS-Q is not full (see Step 4):
$   dsrutil <source> -C terminate server
 
4.  Check the pica queue (RPLS-Q) in PROMON:
$   promon <source> -> R&D -> 1. Status Displays -> 16. Database Service Manager

If the RPLS-Q is full:

The source database needs to be stopped and restarted (see quick and easy  above). 

Closing a Command Prompt that is running a hung DSRUTIL command (dsrutil -C terminate server) causes an abnormal database shutdown:
If the RPLS-Q is not full:
and the connect-timeout has expired and the RPLS is in defer-agent time and the RPLA is still listening, but RPLS-RPLA contact cannot be completed, then terminate the RPLS and only if confirmed as stopped, release waits:
$   dsrutil <source>-C terminate server
$   dsrutil <source>-C RELWAITS

If this fails, then try disconnecting RPLS process through:
$   promon <source> -> 8 -> 1 disconnect user

Then restart the replserv process.
$   dsrutil <source> -C restart server

As a last resort, kill the RPLS pid, and restart the RPLS
  • Killing the RPLS when not connected to the RPLA is fail safe.
  • It is only when RPLS RPLA are communicating and killed that there is a remote possibility that when restarted synchronisation will fail.
  • It is not possible for target corruption if synchronisation has completed successfully and replication is in normal processing.
$   dsrutil <source> -C restart server

5. Otherwise, shut source down and restart, ensuring that the RPLA is still listening (otherwise the target will also need to be started).
Workaround
Notes
Keyword Phrase
Last Modified Date9/29/2021 3:48 PM

Powered by