The next actions are preventative and corrective, after recovering the database from an unexpected bi growth situation:
Transaction backouts are achieved by reading the Before-image (BI) file - a file that records transaction notes during run-time. What causes the .bi to grow exponentially can be manifold, but from the database perspective it is always due to
long running transaction(s) which lock the bi cluster from re-use at checkpoint time.
Corrective actions are therefore directed towards:
- Restricting bi growth and monitoring the database until the root cause has been addressed
- Identifying which transaction(s) contribute to the .bi growth and addressing the application code responsible.
A. Force the BI file to not grow beyond a specific sizeUse the database startup parameters
-bithold to restrict how large the bi file can grow and
-bistall to allow immediate investigation into likely candidates.
- When bi file approaches or eventually reaches the restricted size, the database engine prevents further growth by stalling the database until manual intervention. The -bithreshold can then be raised to allow forward OLTP processing to continue, while runtime investigation proceeds. Refer to Article, How can -bithold -bistall and the proquiet bithreshold be implemented?
- It is inadvisable not to use -bistall while seeking to solve what is causing bi growth. The database engine prevents further growth by shutting the database down instead. This action no longer presents the opportunity to isolate open transactions and eventually the application code in use at runtime. If the database is enabled for after-imaging, clues can be extracted from the ai file content. But this is no-where near as efficient as being able to collate information while the situation is happening with the database online. Refer to Article, How to use an After Image file to find the user running a long transaction
B. Add more bi file spaceWhen largefiles cannot be enabled on databases running with WorkGroup or Personal (or Development licenses) consider:
- Since Progress 9.1C, the 2GB limit applies to individual bi extents
- For Progress 8 and earlier, the 2GB limit applies to the sum of bi extents. While more bi extents can be added to split the bi extents across discs for example, it will not help in overcoming this limit.
- Even when largefiles have been enabled for the database, consider adding more fixed bi extents while the cause is being isolated. In this way a quick and dirty alert will be when the variable bi file extent starts to grow.
C. Isolate long running transactions at runtimeSince long transaction duration triggers excessive BI growth, these can be found through a PROMON investigation against the running database:
$ promon dbname > R&D > 1. Status display > 4. Processes/Clients > 3. Active Transactions
$ promon dbname > R&D > 4. Administrative Functions > 1. Check Active Transaction StatusThe results presented provide slightly different information, however they both provide the details required to begin isolating potential candidates :
11/22/19 Status: Active Transactions by user number for all tenants
16:36:53
Usr:Ten Name .. Login time Tx start time Trans id .. Trans State
.. ..
6 Ubatch .. 11/22/19 13:12 11/23/19 13:30 34673 .. Active FWD
- A list of users with open active transactions will be displayed, pay special attention to the client's start times.
- Start with the those users that show the longest active transaction start time. It is these Active transactions which are locking the bi cluster their transaction start notes were recorded in, that is preventing them from being re-used at Checkpoint time. The database engine then has to add more bi clusters to the bi chain in order for forward transaction processing to be recorded, thereby extending the bi file size.
- As a general rule, a transaction time of over 10 minutes is considered unusual and should be investigated.
- Once the application code has been identified (see below), it may be that some piece of code is revealed to be poorly scoped for transactions, record scope or is running in a recursive loop for example. On the other hand, transaction or record scope could be locking records needed by other client sessions leading to transaction times in this application code area being longer than needed over-all. Tuning the -lkwtmo can alleviate this situation while the source of the problem is still being isolated.
Transaction information can also be found by interrogating the
_Trans VST table:
Example 1: The longest running transaction will be the one at the top of the results list:
FOR EACH _trans NO-LOCK WHERE _Trans-State <> ? BY _Trans._Trans-Duration DESCENDING:
FIND FIRST _Connect WHERE _Connect._Connect-Usr = _Trans._Trans-Usrnum.
DISP STRING(TIME,"HH:MM:SS") LABEL "TIME NOW"
_Connect._Connect-Name
_Connect._Connect-PID FORMAT 99999
_Trans._Trans-Usrnum
_Trans._Trans-Num
_Trans-Duration
_Trans._Trans-Txtime FORMAT "X(40)".
END.
Example 2: Report on all transactions that have been running longer than x-time
DEFINE VARIABLE iTimeInSeconds AS INTEGER NO-UNDO INITIAL 600.
FOR EACH _Trans WHERE _Trans._Trans-State EQ "Active" AND
_Trans._Trans-Duration GT iTimeInSeconds NO-LOCK:
FIND FIRST _Connect WHERE _Connect._Connect-Usr = _Trans._Trans-UsrNum NO-LOCK NO-ERROR.
DISPLAY _Trans-Num _Trans-UsrNum _Trans-Duration _Connect._Connect-Name.
END.
Once the long running
transaction number can be determined. The easiest way to match that transaction to the application code is to find out from the user what they were running at the time, otherwise further Diagnostics are listed below.
D. Isolate which sessions are making heavy writes to the bi fileWhen unexpected BI growth is currently being observed against a running database, the
I/O Operations by Process screen in PROMON can be interrogated:
$ promon dbname -> R&D -> 3. Other Displays -> 2. I/O Operations by Process
- A list of the user sessions currently connected to the database and the number of reads and writes to the database, BI and AI files for each user.
- By entering 'R' at the "Enter <return>, R, U, P, T, or X (? for help): " prompt, the values will be refreshed,
- By repeatedly refreshing the screen, users that have larger than expected growth in the number of writes to the BI file(s) should become apparent.
In OpenEdge 11.7, this information has been added to the same view to identify long running transactions:
"BI RReads BI RWrites":$ promon dbname > R&D > 1. Status display > 4. Processes/Clients > 3. Active TransactionsThis information can also be obtained by querying the
_UserIO VST table:
FOR EACH _UserIO NO-LOCK WHERE _UserIO-usr >= 1:
DISPLAY _UserIO-Usr _UserIO-Name _UserIO-BiWrite.
END.
E. Isolate the Application code responsible:Once an ABL client process that is causing more BI writes than expected can be isolated:
Dump a protrace:It is possible to determine what ABL program(s) that client process is currently running by executing the
proGetStack command on the same machine where the client process is running since OpenEdge 10.1C:
$ proGetStack <pid>
Client Database-Request Statement Cache:It is possible to see what the application code is doing on the database-side. Use the user number of the user holding the oldest transaction to lookup
$ promon dbname > R&D > 1. Status Displays > 18 Client Database-Request Statement Cache
F. Isolate Application Code Transaction scopeAgain, as it is not necessarily
"a single procedure" that needs identifying, it is one or more transaction(s) that are causing the .bi file to grow. When the code that is responsible for bi growth is isolated, the transaction scope in that code needs to be addressed.
1. The COMPILE with
"Listing File" helps to identify where the transaction blocks start and end. Compile the Application code in the
Application Compiler with the listing file:
Select OPTIONS -> compiler -> "Listing File" = (say) "listing.txt" + check "append".Or instead of using the Application Compiler, simply execute the COMPILE statement with the LISTING option:
COMPILE <program-name> LISTING <program-name>.lst.
From the output of this action, a map of the code will be created. By analysing this map, where the blocks start and end will be apparent and from this detail, investigation of the transaction scoping of the application. Any large transactions that are identified are likely candidates.
At runtime, (in versions since 10.1B) use the 4GLTrans log entry type to log the beginning and end of transactions that may have started or spread dynamically.
For example:
-clientlog <logfilename> -logentrytype 4GLTrans -logginglevel 3
G. Gather Application Runtime MetricsConsider implementing the
"use statistics with cross reference (-yx)" startup parameter for a period of time which will give a fair idea of where the busiest code is in terms of:
- Long execution times and large number of calls, which imply the necessity of revisiting code considering internal procedures and include files
- Re-reads, where increasing the -mmax client memory could help but will not resolve the transaction times.
H. Enable Database Diagnostic Data Collection for BI growth.
For example,
at db startup, -diagEvent bithold:3
https://docs.progress.com/de-DE/bundle/openedge-database-management-122/page/Specifying-an-event.html
or online promon > R&D > option 4 (Administrative Functions) > Diagnostic Data Collection