Salesforce

unpredictable consequences Network Maintenance on replication

« Go Back

Information

 
Titleunpredictable consequences Network Maintenance on replication
URL Nameunpredictable-consequences-Network-Maintenance-on-replication
Article Number000125729
EnvironmentProduct: Progress OpenEdge
Version: All supported versions
OS: All supported platforms
Question/Problem Description
Scheduled Network Maintenance causes unpredictable consequences in an OpenEdge replication environment
Source database stalls, clients hang
Source database accepts no new client connections that terminate with "pending connection"
AI files fill up in LOCKED Status while the RPLS is still running with "Normal Processing"
Sometimes the the RPLA is in PRE-TRANSITION, but the RPLS is still running
Sometimes the RPLA and RPLS are still running but no transaction notes are being updated and it falls further behind until the RPLS-Q exhausts
The RPLS cannot be stopped when the RPLS-Q is full
 
Steps to Reproduce
Clarifying Information
Network Maintenance is carried out at a specific time every weekend on the remote target database
The OpenEdge replication environment usually recovers from scheduled Network Maintenance
Error Message
Defect Number
Enhancement Number
Cause
Depending on the exact nature of the Network Maintenance is carried out, existing socket communications (the RPLS > RPLA) are not notified and refreshed.

For example:

Target databases were notified that they could no longer receive communications from the RLPS
(RPLA: (11699) A TCP/IP failure has occurred. The Agent's will enter PRE-TRANSITION, waiting for connection from the Replication Server.

For all intents and purposes the RPLS had no reason to believe that it was not still able to send packets to the target, therefore requiring that it had to re-establish connection to the RPLA by absence of similar messages at the time on the source database side. It continued queueing source database transaction notes as AI blocks filled in the RPLS-Q (-pica) for the network layers to send.

The fact that while the target database was restarted while the RPLS was still running, proves the above fact:
RPLA: (11700) This Replication Agent has never been contacted by a Replication Server. The Agent is ending so the Target database can be shutdown normally.)

Only when the RPLS socket was also reset, was the RPLS able to contact the remote listening RPLA socket
Resolution
Given the unpredictable consequences of this NMA and considering the event scheduled, employ controlled methods on the replication environment:

1. To prevent the source database from stalling: 2.  Restart the RPLS
When the agents are in pre-transition mode waiting, re-start the RPLS after the NME.
This way the RPLS that was still running but unaware of targets therefore still listening on existing sockets that have no endpoint, is reset.

Or schedule a replication shutdown prior to the scheduled NME event and restart replication after Network Maintenance has been carried out.
In this way, whatever the NME event entailed, will ensure the new socket communications are established with for example the new firewall rules applied.
  • Stop repl-server process on source:   dsrutil source -C terminate server
  • Stop both targets databases: proshut target -by -shutdownTimeout 60
  • Start both target databases
  • Start repl-server process on source: dsrutil source -C restart server
3.  Additional Considerations

Ensure Network administrators are aware of are the ports replication are communicating on:
  • The Target Database Broker ports are configured in the source.repl.properties 'port' argument and the Source Database Broker port is the Service (-S) database startup parameter value.
  • The Listener ports in the respective target.repl.properites are in the range (which can be narrowed as long as these ports are available): Example
T1 listener-minport=4900 .. listener-maxport=4905
T2 listener-minport=5200 .. listener-maxport=5205
Workaround
Notes
Keyword Phrase
Last Modified Date11/20/2020 7:02 AM

Powered by