The bi on a hotspare roll forward target database will grow larger than the bi on the source database. To review a detailed discussion from a DB internals perspective, refer to Article:
In standby-hotspare environments a progressively larger bi file in the target environment can result when source database transaction activity is applied. This Article, discusses:
- Why truncating the target (hotspare) database is not a supported method of managing the bi size of the target database when subsequent roll forward operations are still required.
- How to reclaim bi filespace used by the target roll forward database
- How to avoid bi growth by aligning roll forward operations with bi cluster re-use
Truncating the target (hotspare) database is not a supported method of managing bi growth
A method used to manage a large bi filesize on the roll forward database is to truncate the bi file when there are zero in-flight transactions at the end of the last roll forward then apply the next AI files.
This is confirmed with RFUTIL aiscan reports or by parsing the database lg file while applying AI files:
At the end of the .ai file, 0 transactions were still active. (1636)
While this method is safe, it should never be perceived as reliable and does not guarantee standby baseline continuity for the next foll forward operations.
This method appears to work and allow further roll-forward operations post bi truncation when there are no in-flight transactions, particularly in test environments and most of the time once implemented in production.
This is not enough to depend on it as a standard practice, particularly when the consequence of the next ai rolled forward fails necessitates having to re-seed the standby hotspare baseline. In an emergency it can be used to reclaim bi space as long as knowing that if it fails, re-baselining from a more recent backup - preferably without long outstanding transactions - is the accepted method to reset the bi file size.
The need to recover bi space is most apparent after an unexpected bi growth event on the source database that is eventually resolved without having to disable after-imaging:
- Once all outstanding transaction(s) are eventually committed at the end of the most recent AI file applied.
- PROUTIL truncate bi reclaims disk space in the standby environment.
- The next sequence set of ai files applied fail to roll forward.
- The standby database has to be re-sourced unexpectedly because the roll-forward baseline is broken as a consequence of truncating the hotspare bi file.
The reason this method is not supported is because this usage was not one which was anticipated or considered as intended behavior when the multiple AI extent feature was designed, way back for the first Progress 7 release. It remains a method that is not supported, not because it is undesirable but because it has not been designed to accommodate the complexities of internal transaction notes beyond those that are simply related to application activity. This is why it is not documented and no regression tests are designed to ensure it will continue to work this way in future releases.
Why does a connection to the hot standby (target) put the database through crash recovery?
When the standby hotspare is opened before the final roll forward session required to restore the database when it is then opened for access, subsequent roll forward redo processing will fail.
When the database is opened, before-image recovery runs and the model presumes any outstanding transaction (without a commit) must be aborted, before access is granted.
- There are Three Phases of BI recovery that take place when the database is opened.
- All transactions that have not been committed initiate the Physical and Logical UNDO Phases to effectively abort the transaction.
- When the roll forward database is prematurely accessed by any OpenEdge utility capable of making changes, a single user client or started for multi-user access:
- Subsequent roll forwards fail due to timestamp mismatches between the Master Block and the .bi file.
- Data is lost when transactions that span AI extents is undone by the Physical and Logical UNDO Phases
RFUTIL only runs the BI recovery REDO Phase, to allow further transaction 'redo' as AI extents are applied until the target reflects the original database at the time of a crash, disaster or intended roll forward time/transaction. Access can be restricted using the OPLOCK/OPUNLOCK qualifiers to prevent unintended access before the final transaction notes are applied. For further information refer to Article:
To re-set the BI file size on target:
- Stop the live (source) database.
- Depending on site strategy, re-seed the standby hotspare:
- Truncate the (source) bi then use the OS backup method ensuring the database is marked backedup and the ai file is switched.
- An online PROBKUP can be used when there are no long running transaction activity also causing unexpected bi growth in the source database environment. Otherwise, an offline PROBKUP will truncate the bi file as part of the process and switches the AI file.
- Restore the target database for future roll forward operations
Considerations to minimize bi growth on the roll forward target database
Review the following parameters which govern the design of before-image management.
1. Cluster Ageing (-G)
Prior to Progress 9.1E02, 10.0B02, ensure that -G (Cluster Ageing) database startup parameters is set no lower than 60 seconds and account for any reason it may need to be set higher. Due to the design handling of flushing data in these versions, cluster ageing lower than 1 minute introduces a high risk losing cached data.
Since Progress 9.1E02, OpenEdge 10.0B02 Cluster Ageing no longer needs to be considered. The default for -G is 0 facilitated by design changes that use fdatasync() instead of sync() calls to assure data associated with closed BI Clusters is flushed to disk before eventually returning them for re-use. The only reason Cluster Ageing may need to be increased is for
In any Progress version, take care when using a Cluster Ageing (-G) value above the default value for normal operations.
Example: A site had a 3-minute cluster ageing interval set (-G 180) and AI switches at 5 minute intervals averaging 40MB each.
- Roll forward was performed in half-hour batches (6).
- The associated AI files took less than 3 minutes to complete their roll forward operations. These transaction notes are also recorded in the target database's BI file.
- The associated bi clusters cannot be re-used until the Cluster Ageing time expires.
- The consequence of having to add BI Clusters due to the imposed 3-minute cluster ageing, results in a 150% difference in bi file size between the source and target databases.
- It would be considered unusual to need cluster ageing as part of normal operations unless there are inherent problems with the underlying cache and disk subsystem.
2. Delay of Before-Image Flush (-Mf)
Ensure "Delay of Before-Image Flush" -Mf is at the default of 3 seconds on the source database otherwise account for any reason it may need to be set higher.
Example: A site had the -Mf database startup parameter set to 120 seconds and experienced 40% difference in bi growth on the roll forward database consequently. After dropping it back down to the default 3 seconds, bi files were again within acceptable comparisons.
3. BI Cluster Size and BI Blocksize
The BI Cluster Size and BI Blocksize on both databases should be the same.
Cluster aging takes place at the close (checkpoint) of a BI cluster. When the bi clustersize is changed on the source, re-baseline the target database from a new backup so that the target database has the changed value.
4. AI switch interval
Consider increasing the AI switch interval to allow more transactions to complete within an AI extent.
Alternatively, consider an AI switch strategy based on 'bytes' rather than time switches by using fixed AI extents that switch only when filled.
5. Roll Forward interval
Roll forward the AI notes in 'real time' or in smaller batches to allow cluster ageing to take effect.