Salesforce

OpenEdge Replication: source database does not restart due to errors (3773) and (5350) on AI file

« Go Back

Information

 
TitleOpenEdge Replication: source database does not restart due to errors (3773) and (5350) on AI file
URL NameP124628
Article Number000128037
EnvironmentProduct: OpenEdge
Version: 10.x, 11.x, 12.x
OS: All supported platforms
Question/Problem Description
Source replication database goes down with errors (10601), (3779),  (3773) and  (5350) on the current after-image file.
PROBKUP fails with errors (3775) and (3776).
RFUTIL -C aimage empty reports no FULL AI extents.
Database fails to restart with errors (3773) and (5350).
Steps to Reproduce
Clarifying Information
file-name in errors refers to an after-image file: dbname.an
The Target Replication Agent has shut down previously.
Source database does not use -aistall
Error MessageCan't switch to after-image extent it is full. (3775)
Backup ai extent and mark it as empty. (3776)

There are no Full extents (3687)

SYSTEM ERROR: Attempted to exceed maximum size on file dbname.an (10601)
Can't extend ai extent dbname.an (3779)
Can't switch to after-image extent dbname.an+1 it is full. (3773)
Database Server shutting down as a result of after-image extent switch failure. (5350)
Failed to switch to next after-image extent. (3784)
Defect Number
Enhancement Number
Cause
The current AI file fails to switch when:
  • The extent is the only ai extent.
  • The current variable-length after-image extent cannot be extended having reached filesize | user limits or running out of diskspace and the next ai extent in the ai sequence is not a free "EMPTY" ai extent. In other words the remaining ai extents are "LOCKED" or "FULL" and therefore not available.
  • The current BUSY FIXED ai extent needs to switch to the next ai extent in sequence whose status "LOCKED" or "FULL".
Under the OpenEdge Replication model:
  • When the Replication Agent (RPLA) of the target database terminates and/or the target database server goes down, the replication server (RPLS) on the source database will also terminate after the connect-timeout has expired.
  • At this stage, the source database is still running and so is after-imaging.
  • AI files continue to fill up during this time recording the database activity in AI transaction notes.
  • As they switch to the next AI extent, their status changes from "BUSY" to "LOCKED" under the replication model.
  • The "LOCKED" status will/can only change when the target database is restarted (therefore the RPLA) and  "dsrutil source -C restart server" on the source database, so that the RPLS can connect to the RPLA, synchronize and begin to apply the ai notes at a block level where it last left off.
In other words: There is no way to change the LOCKED status to anything else while Replication is enabled

Whenever a "FULL" ai file has not been applied to the target database, it will stay in the "LOCKED" status until such time as it has been applied fully to the target database.  Once it has been applied, its status will change to "FULL" when it can then be made available again with the "RFUTIL source -C aimage empty".  This is how the model works.  It is therefore imperative to monitor the after-image extent availability and during times when the RPLS and RPLA have lost connection and take proactive measures.
Resolution
There is no need to disable replication on the source database, this is option listed as a last resort below. There are other possibilities to recover from this scenario depending on: the current status of the AI extents and the conditions of the replicated environment at the time.  

These recovery methods essentially involve making existing or new ai extents available for the source database to continue operations, while the RPLS reconnects with the RPLA in order that the target database is synchronised with the source database and ai notes can then continue to be applied eventually bringing the target in line with the source, releasing ai extents from the LOCKED status to FULL, where they can be archived and emptied to be available again .

After applying the recovery method to the current scenario, once the target database is synchronized with the source, the ai notes will be processed against the target database while activity is allowed to continue on the source database.  As soon as each LOCKED ai extent has finished being processed, it will be marked FULL and therefore available again once they are marked EMPTY with rfutil source -C aimage empty or the AI Management deamon.  The progress of this activity can be monitored with:
 DSRUTIL target -C monitor > Option A: Replication Agent 

The key factor in recovering this scenario is the availability of ai files during times when replication has ended and normal processing continues against the source database. In order to maximise the ai extent space available, it is worth stopping ai switch batch/cron jobs during this recovery operation. If AIMGT is enabled, change the timed interval to on demand with:
$   rfutil source -C aiarchiver setinterval 0

Before proceeding, review the current status of the ai extents which can be queried with: 
$   RFUTIL source -C aimage list

Scenario A -  IF there are any FULL ai extents, empty them:

First archive all FULL AI extents off with OS copy utilities then manually marked these as empty using the following command:
$   rfutil source -C aimage empty

If AIMGT is enabled, full ai extents need to be manually emptied differently, refer to Article:
If -aistall had been in place in the source database startup parameters, the source database would still have been running but no transaction activity allowed until ai extents became available.  By making these available, the -aistall will immediately lift and normal processing resumes. It is only necessary to restart the Replication Server process (RPLS) against the source database and the RPLA on the target for replication to resume:

$   dsrutil source -C restart server
$   dsrutil source -C restart server {Since OpenEdge 11.6}

Refer to Article:  How to restart RPLS, RPLA and target database when the source database is running.

Otherwise restart the target and source databases.

Scenario B - IF there are still available EMPTY variable ai extents, but no disk space available:
  1. Shut the source database down if -aistall is in use otherwise the source database will already be down: proshut source -by
  2. Move the ai extents that were available (EMPTY) but had no diskspace and the current BUSY ai extent to another disk
  3. Run: prostrct list source source.st
  4. Edit source.st to reflect the new absolute file location of the moved ai files
  5. Run: prostrct repair source source.st
  6. Run: prostrct list source source.st again and Verify the resulting source.st output to ensure that the Control Area of the source database knows where the ai files are where they have been moved to
  7. Start the source database
  8. Start the target database

Scenario C -  IF there are no FULL or EMPTY ai extents

In other words all ai extents are marked LOCKED (except the current BUSY ai file).

IMPORTANT NOTE: This Option is only available in Progress 9.1E or Open Edge 10.0B and later, (with the exception of OpenEdge 10.1B 10.1B01 and 10.1B02. Refer to Article 
1. If -aistall is in use shut the source database down, otherwise the source database will already be down
$   proshut source -by
2.  Add more ai extents where addai.st defines where the new ai files will be placed. These new ai extents can be added anywhere there is disk space available.
$   prostrct add source addai.st
3.  If the new EMPTY ai extents need to be reordered to immediately follow the current BUSY ai extent. This is an offline utility.
$   prostrct reorder ai source 
4.  Start the source database
5.  Start the target database.

 
Scenario D - Manually roll forward the LOCKED ai extents onto the target 

Once the LOCKED and BUSY ai extents have been manually applied to the replication target database, the LOCKED files will get cleared down very quickly once replication resumes.

IMPORTANT NOTE: This Option is only valid if the current BUSY extent still has space to write to, in other words, the source database can be started and still has a small amount of ai file space to write to.  Regardless, the following is also a very good technique to get the target in line with the source when replication has been down for some time and there are a lot of ai notes to synchronize. The full details of using DSRUTIL applyextent under different scenarios and OpenEdge versions are described in the following Article : How to manually apply a LOCKED AI extent to the target database?  
 
 [TARGET]

1.) The Agent needs to be in Pre-Transition state. Verify the status of the RPLA either:
$   dsrutil target -C monitor

A.  Replication agent status
 State: Pre Transition 

or, parse the target.lg file for message:
RPLA    5: A TCP/IP failure has occurred.  The Agent's will enter PRE-TRANSITION, waiting for connection from the Replication Server. (11699)

If the agent is not in Pre-Transition state, force this state. The target database needs to be online.  You cannot trigger transition if the Replication Agent is still connected to the replication server.
$   dsrutil target -C triggertransition agent

2.)   Find out how far the target is behind the source, then roll forward ai files not already applied to the target database.

[source]: Get the current state of each source.an file, only ai files with Status = LOCKED & BUSY are relevant
$   rfutil source -C aimage list

[target]: Find the last ai file that was being applied by the RPLA process
$   dsrutil target -C RECOVERY agent > recagent.out

The following information from 'recagent.out' is relevant to this Example:
  Replication local agent information:
    Last Block:                     Incomplete
    ID of the last TX begin:        1613
    ID of the last TX end:          1642
    Time of last TX end:            Thu Oct 04 18:16:08 YYYY
      After Image File Number:      6
      Completely Applied to Target:  No

Roll forward ai notes, starting with the ai extent listed in the example above until the current BUSY extent eg:
$   dsrutil target -C  ApplyExtent source.a6

The target.lg file will show similar messages to the following upon successful completion:
RPLA    5: Application of Source database AI Extent source.a6 has begun.
RPLA    5: Retry transaction point located at dbkey 0 note type 13 updctr 0. (6806)
RPLA    5: Retry point located at dbkey 662272 note type 25 updctr 6. (6807)
RPLA    5: Source database AI Extent source.a6 has been applied to this database.

Existing recovery notes would normally be transitioning the target database at this stage. In this case, it is not the intention to transition the database in this case, merely continue where we left off. Do not transition the target database (dsrutil target -C transition agent)

3.)  After all the LOCKED ai extents except the current BUSY extent have been successfully applied, start the source database. Provided there are sufficient AI space to record BI recovery notes, when the target database is started and replication resumes, the LOCKED files will get cleared down very quickly once the two databases are synchronized.

Scenario E - Disable After-Imaging then re-enable Replication

As previously outlined, this should be the option of last resort. It will require re-baselining the target database(s) with a new backup of the source. Every site should have a documented procedure for this task, the following instructions simply outline the first steps to disable after-imaging in order for production to continue running without having to accommodate the AI structure.

Online:
$ dsrutil source_dbname -C disablesitereplication source
Delete the *.repl.recovery 
$   rfutil source_dbname -C aimage aioff

Offline:
$  proutil source_dbname -C disablesitereplication source
Delete the *.repl.recovery
$  proutil source_dbname -C aimage end

This instruction is further detailed in Article: Errors 3775 3776 while taking an online backup of a replication enabled source database   
Workaround
Notes
Keyword Phrase
Last Modified Date9/11/2023 10:40 PM

Powered by