Information

Title	unpredictable consequences Network Maintenance on replication

URL Name	unpredictable-consequences-Network-Maintenance-on-replication

Article Number	000125729

Environment	Product: Progress OpenEdge Version: All supported versions OS: All supported platforms

Question/Problem Description

Scheduled Network Maintenance causes unpredictable consequences in an OpenEdge replication environment
Source database stalls, clients hang
Source database accepts no new client connections that terminate with "pending connection"
AI files fill up in LOCKED Status while the RPLS is still running with "Normal Processing"
Sometimes the the RPLA is in PRE-TRANSITION, but the RPLS is still running
Sometimes the RPLA and RPLS are still running but no transaction notes are being updated and it falls further behind until the RPLS-Q exhausts
The RPLS cannot be stopped when the RPLS-Q is full

Steps to Reproduce

Clarifying Information

Network Maintenance is carried out at a specific time every weekend on the remote target database
The OpenEdge replication environment usually recovers from scheduled Network Maintenance

Error Message

Defect Number

Enhancement Number

Cause

Depending on the exact nature of the Network Maintenance is carried out, existing socket communications (the RPLS > RPLA) are not notified and refreshed.

For example:

Target databases were notified that they could no longer receive communications from the RLPS

(RPLA: (11699) A TCP/IP failure has occurred. The Agent's will enter PRE-TRANSITION, waiting for connection from the Replication Server.

For all intents and purposes the RPLS had no reason to believe that it was not still able to send packets to the target, therefore requiring that it had to re-establish connection to the RPLA by absence of similar messages at the time on the source database side. It continued queueing source database transaction notes as AI blocks filled in the RPLS-Q (-pica) for the network layers to send.

The fact that while the target database was restarted while the RPLS was still running, proves the above fact:

RPLA: (11700) This Replication Agent has never been contacted by a Replication Server. The Agent is ending so the Target database can be shutdown normally.)

Only when the RPLS socket was also reset, was the RPLS able to contact the remote listening RPLA socket

Resolution

Given the unpredictable consequences of this NMA and considering the event scheduled, employ controlled methods on the replication environment:

1. To prevent the source database from stalling:

Increase the RPLS-Q size with the -pica database startup parameter on the source database. Refer to Article How to calculate the optimum -pica setting for OpenEdge Replication
Ensure sufficient AI files to hold source transaction notes for later source/targets synchronization

2. Restart the RPLS
When the agents are in pre-transition mode waiting, re-start the RPLS after the NME.
This way the RPLS that was still running but unaware of targets therefore still listening on existing sockets that have no endpoint, is reset.

Or schedule a replication shutdown prior to the scheduled NME event and restart replication after Network Maintenance has been carried out.
In this way, whatever the NME event entailed, will ensure the new socket communications are established with for example the new firewall rules applied.

Stop repl-server process on source: dsrutil source -C terminate server
Stop both targets databases: proshut target -by -shutdownTimeout 60
Start both target databases
Start repl-server process on source: dsrutil source -C restart server

3. Additional Considerations

Ensure Network administrators are aware of are the ports replication are communicating on:

The Target Database Broker ports are configured in the source.repl.properties 'port' argument and the Source Database Broker port is the Service (-S) database startup parameter value.
The Listener ports in the respective target.repl.properites are in the range (which can be narrowed as long as these ports are available): Example

T1 listener-minport=4900 .. listener-maxport=4905
T2 listener-minport=5200 .. listener-maxport=5205

For further information refer to Article What ports need to be open for OpenEdge Replication through a firewall

Workaround

Notes

Progress Article:

OpenEdge Replication Agent and Server terminates after connection failure

Keyword Phrase

Last Modified Date	11/20/2020 7:02 AM

unpredictable consequences Network Maintenance on replication

Information