Salesforce

How to change the connection timeout for the replication server to agent connection

« Go Back

Information

 
TitleHow to change the connection timeout for the replication server to agent connection
URL NameP114424
Article Number000169067
EnvironmentProduct: Progress
Version: 9.x
Product: OpenEdge
Version: 10.x, 11.x, 12.x
OS: All supported platforms
Question/Problem Description
How to change the connection timeout for the Replication Server to Agent connection
How to stop the Replication Server process from shutting down too soon when there's a delay connecting to the Replication Agents
Where to set the connect-timeout of the Replication Server process
What is the maximum setting for connect-timeout in source.repl.properties
Is there anything that can be done to make OpenEdge Replication better able to deal with short network "hiccups"?
Steps to Reproduce
Clarifying Information
Error Message
Defect Number
Enhancement Number
Cause
Resolution
OpenEdge Replication has two distinct connect-timeout parameters configured in the .repl.properties file:

1.  The [control-agent.name] section of the source database properties file there is a connect-timeout parameter.  
  • Specifies: the number of seconds the Replication Server will attempt to connect to the configured target database Agent(s) before the Replication Server shuts itself down, or goes into defer-agent-startup time before the Replication Server shuts down when this expires or DSRUTIL -C canceldefer is instructed. For further discussion, refer to Article  How does defer agent startup work?  
  • The source RPLS cannot be terminated before the initial connect-timeout has expired.
  • This property is also used by the RPLS while reconnecting to the RPLA after communications have been lost. For futher discussion refer to Article  What happens with replication when there are network problems?  
2. The [agent] section of the target database properties file there is a connect-timeout parameter.  
  • Specifies: the number of seconds the Replication Agent waits for connection from the Replication Server before the Replication Agent shuts itself down or goes into PRE-TRANSITION mode when agent-shutdown-action=recovery is configured.
The minimum (and default) connect-timeout value is 120 with a maximum value of 86,400 seconds.

Example:
[server] 
control-agents=agent1, agent2 
defer-agent-startup=1200
agent-shutdown-action=recovery 
... 
[control-agent.agent1] 
name=agent1 
connect-timeout=600 
... 

[control-agent.agent2] 
name=agent2 
connect-timeout=1200 
... 
 
[agent] 
name=agent1 
database=target1 
connect-timeout=720 
... 

[agent] 
name=agent2 
database=target2 
connect-timeout=1440

Ensure that the control-agent, connect-timeout setting does not exceed the time that it would take for the pica buffer (RPLS-Q) to fill, otherwise the source database will experience freezes while it waits for data to be replicated.

When the Replication Server process is running, it uses a buffer to store pointers to AI blocks in the AI files (often referred to as a pica buffer/queue).  When this buffer becomes full, then the source database must wait for a pointer to become free by replicating an AI block and freeing a pointer before it will allow further writes.  If the pica buffer becomes full because network communications are interrupted or cannot be established, and the Replication Server is still running, then any further filled AI blocks cannot replicated  to the target database until a pointer is freed, therefore the source database freezes.

When the Replication Server is shutdown it no longer uses the pica buffer. AI transaction notes accumulate in the AI files for replication block level processing at a later time when the Replication Server and Agents have been (re)started, connected and a synchronisation point established.

In other words, while the Replication Server is running the maximum amount of AI data that can be backlogged is based upon how many pointers to AI blocks can fit within the pica buffer.  However when the Replication Server is not running, the amount of AI data that can be backlogged is based upon the AI file capacity available.

It therefore makes sense that the connect-timeout should not exceed the time it takes for the pica buffer to fill.  That way, the Replication Server process will shutdown automatically when the connect-timeout is exceeded before the pica buffer is full and will therefore allow transaction notes to build up in LOCKED AI files instead.

To monitor the status of the Replication Server, the replication monitor can be used to query the current status value:
$   dsrutil <dbname> -C status -detail

Please refer to the OpenEdge Replication Users Guide for valid return codes.
Workaround
Reference Progress article:
 What happens with replication when there are network problems?
Under the workaround section a manually recover option is provided.
Notes
Keyword Phrase
Last Modified Date12/21/2021 2:56 PM

Powered by