Salesforce

All after imaging (AI) areas are in LOCKED state after network failure

« Go Back

Information

 
TitleAll after imaging (AI) areas are in LOCKED state after network failure
URL NameP122873
Article Number000116853
EnvironmentProduct: Progress
Version: 9.1D, 9.1E
Product: OpenEdge
Version: 10.x, 11.x, 12.x
OS: All supported platforms
Other: Fathom Replication
Question/Problem Description
All after-imaging (AI) areas on a source replication enabled database are in LOCKED state after a network failure.
Source database log file shows connection and communication failure errors at the time of the network failure.
The Replication Server stops 10698 leaving the source database running 
PROBKUP fails because it can't switch to an empty AI extent.
Eventually the current AI file fills and there are no available AI files to switch to 3775
When the database is started with -aistall, all online transaction processing is stalled 12288
When the database is not started with -aistall, the source database shuts down
Steps to Reproduce
Clarifying Information
Error MessageConnection failure for host <host> port <port> transport TCP. (9407).
A communications error -4008 in rpCOM_RecvMsg. (11713).
A communications error -157 occurred in function rpNLS_SendAIBlockToAgent while sending AIBLOCK. (10491).
The Fathom Replication Server is beginning recovery for agent agent1. (10661).
Connecting to Fathom Replication Agent agent1. (10842).
The Fathom Replication Agent agent1 cannot be contacted by the database broker on host <host>, port -1. (10496).
The connection attempt to the Fathom Replication Agent agent1 failed. (10397).
The Fathom Replication Server was unable to reconnect to agent agent1. Recovery for this agent will not be performed. (10697).
The Fathom Replication Server will shutdown but the source database will remain active. (10698).
The Fathom Replication Server is ending. (10505).
There are no available EMPTY AI extents. Database activity is stalled until an AI extent becomes available. (12288).
Can't switch to after-image extent <extent> it is full. (3775).
!!! ERROR - Database backup utility FAILED !!! (8563).
Defect Number
Enhancement Number
Cause
The network outage exceeded the value specified configured for the connect-timeout parameter.

When the connect-timeout parameter is exceeded, the Replication Server process (on the source database) automatically shuts down.  This stops OpenEdge/Fathom Replication from replicating however normal database activity can continue to occur.  All transaction activity continues to be written to available After Image files.  If replication is not re-started for RPLS to RPLA communication processing to resume, prior to running out of AI space then all AI files will eventually be set to the LOCKED status as they are switched and the database will not be able to accept any new transactions as there is no more AI space to record transaction notes.

Progress backups (PROBKUP) will fail as it will automatically attempt to switch to an empty AI area, which it can't do because all AI extents are currently in the LOCKED status and therefore unavailable. These AI extents will only be released once the full AI transaction note content has been applied to the target database through RPLS -> RPLA communications.
 
Resolution
To make AI extents available for source transaction processing, the following 3 Options are available:

Option 1:  Restart both the Replication Server and Replication Agent.

Once the databases have synchronized it may take some time for all of the data within the oldest AI area to be replicated.  Only after all of the data in the oldest AI extent has been replicated can the status be transitioned from LOCKED -> FULL -> EMPTY.  Once an EMPTY extent is available the current filled BUSY extent will be marked LOCKED and the next ai extent in sequence can be switched to, where normal transaction activity can resume.
 
Pro:  No additional AI areas need to be added to the database.
Pro:  Replication does not need to be re-created / re-initialized.
Con:  All data from the oldest AI sequence needs to be replicated before normal database transaction activity can resume.

To restart the replication Server, if the source database is still online:
$  dsrutil <dbname> -C restart server

To restart the replication Agent:

Prior to OpenEdge 11.6: shutdown and restart the target database. 
OpenEdge 11.6 and later: if the target database is still online:
$   dsrutil <targetdb> -C restart agent
For further instruction refer to Article  How to restart RPLS, RPLA and target database when the source database is running.

If the source database is not currently running, if there are enough AI space to record the BI undo/redo transaction notes, the Replication Server will re-start.
If there is really no further space in the current BUSY AI extent, the source database will fail to start, in which case Options 2 or 3 below need to be considered.


Option 2:  Add additional AI areas to the database. 

There are some caveats to using this option depending on the version of Progress being used.

 
Pro:  Database activity can resume as soon as new AI areas have been added.
Con:  Disk space may not be readily available for additional AI areas.

A.   Progress pre-9.1E, 10.0B,or 10.1A adding additional AI areas will only resolve the problem if the current BUSY AI extent is the last physical AI extent.

After Imaging works sequentially in a sequence ring.  For example, if there are 3 AI extents: A,B,C; Extents A and C are LOCKED; B is Busy:

 
AI extent A: sequence 4  , Status: LOCKED
AI extent B: sequence 5  , Status: BUSY 
AI extent C: sequence 3   , Status: LOCKED
AI extent D: sequence 0   , Status: EMPTY
AI extent E: sequence 0   , Status: EMPTY
When new AI extents D and E are added, after imaging can't jump from B to D to use the new empty extents.

In this scenario, if AI extent C were the BUSY extent, the new EMPTY would be available:

 
AI extent A: sequence 4  , Status: LOCKED
AI extent B: sequence 5  , Status: LOCKED
AI extent C: sequence 6   , Status: BUSY
AI extent D: sequence 0   , Status: EMPTY
AI extent E: sequence 0   , Status: EMPTY

The current status of AI extents can be found by running: 
 
$  rfutil dbname -C aimage list

Example instructions to add new ai extents offline are provided below. After adding the AI extents, restart the databases for replication communications to resume once syncronization has completed.

B.    Progress 9.1E, OpenEdge 10.1B03 or later

Before proceeding, refer to Article 000021663, Unable to switch to new ai extent after adding a new ai extent to the database   

New AI extents can be added when the current BUSY AI extent is not the last physical AI extent by reordering the AI extents offline after they have been added. This will place the new EMPTY AI extents immediately after the current BUSY extent.

a)  Create a structure file containing only the AI areas that need to be added
    
# add.st example
a . v 500000
a . f 500000
a .

b) Add the new AI extents to the database structure:
 
$   prostrct add dbname addai.st
$   prostrct list dbname
$   rfutil dbname -C aimage list > preorder.out
 
c)  If the current BUSY AI extent is not the last physical AI extent, re-order the AI extents:
 
$  prostrct reorder ai <dbname>
$   rfutil dbname -C aimage list > postreorder.out

Optionally, the LOCKED AI extents can be manually applied to the target database, particularly when there is high latency between the source and target databases.  For further instruction refer to Article  How to manually apply a LOCKED AI extent to the target database?  

After adding available AI extents, restart the databases for replication communications to resume once syncronization has completed. 

Option 3:  Disable replication and then re-initialize replication.

By disabling replication, all LOCKED AI extents will become FULL and once archived, can then be archived and emptied with "RFUTIL -C AIMAGE EMPTY".  Once the Network failure has been addressed, replication will need to be re-enabled on the source database and the target replication enabled database will need to be recreated.
 
Pro:  Quick to re-establish normal database transaction activity.
Pro:  No additional AI areas need to be added to the database.
Con:  Normal source database transaction activity is running without replication until replication can be re-enabled and the target database re-initialised.

To disable replication on the source database:
$   dsrutil <dbname> -C disablesitereplication source

To disable replication on the target database, run the command:
$   dsrutil <dbname> -C disablesitereplication target
Workaround
Notes
Keyword Phrase
Last Modified Date12/21/2021 2:56 PM

Powered by