Transactions and Undos. Novel

Posted by George Potemkin on 26-Mar-2019 12:36

It’s important to know how the transactions and their rollbacks are working in Progress. I think there are the myths and the missed details here. In this topic I’d like accumulate the information about the subject. If you have any information you can share with Community, please, add it here. Comments are very welcomed. 

Chapter one: Transactions in VSTs.

 

_ActSummary._Summary-Commits

_ActSummary._Summary-TransComm

_ActOther._Other-Commit

*-Trans

All these fields are sourced from the same counter stored in database shared memory. The counter is incrementing when a transaction is removed from Transaction Table provide the whole transaction was not undone.

_ActSummary._Summary-Undos

_ActOther._Other-Undo

The counter is incremented by each UNDO statement even if it's an undo of sub-transaction. The same transaction can increment both the Commits and Undos counters. The same transaction can increment Undos many times. Fortunately it’s possible to separate the undos of sub-transactions and the backouts of the whole transactions using lasttask.

 

_MstrBlk._MstrBlk-lasttask

_DbStatus._DbStatus-LastTran

mb_lasttask (field in master block, size: 4 bytes, offset: 44 bytes)

It’s the last transaction number. Since V6.2G01 when it gets to 2147483647 it rolls over and continues from 1. At crash recovery the lasttask will be re-set to the last transaction id recorded in the transaction table note in the before-image log. In other words, the recent values of lasttask can be lost and the lasttask can have the non-unique values even on the short intervals.

Progress uses (or used) at least two different algorithms to increment the lasttask.

Simple but rarely used (or not used at all in the recent Progress versions): the lasttask is continuously incrementing by one. It was working that way in V10.2B on Windows for newly created databases. But when you at least once start a database with the -n 128 or higher the algorithm is persistently changed to the different one: the main algorithm. It looks like Progress uses some flag to choose the algorithm. I’m sure it’s not a flag in Master Block. Most likely it’s not a flag in before-image file - at least ‘proutil -C truncate bi’ will not restore the initial algorithm. Location of the flag is a mystery.

The main algorithm: the lasttask is incrementing by one but only (-n + 100) times. Then it jumps up to the value multiple to the power of two – where the power of two is, of course, larger than (-n + 100). Note that strictly speaking the –n parameter does NOT set the “Maximum Number of Users” as documentation and msg #4260 are saying. Maximum number of database connections (the number of entries in Usrctl Table ) is equal to ( -n + -Mn + 2). The –n parameter defines the number of entries in Transaction Table – the maximum number of transactions that can be opened simultaneously. It looks like Progress uses some pool of entries for future transactions (for example, with the pointers to the free entries in Transaction Table) and the size of this pool is the first power of two larger than (-n + 100). But why the size of the pool is larger than the size of Transaction Table?

Examples:

-n 1: Pool size = 128, lasttask increments continuously 101 times, then it jumps by 27;

-n 20: Pool size = 128, lasttask increments continuously 120 times, then it jumps by 8;

-n 1000: Pool size = 2048, lasttask increments continuously 1100 times, then it jumps by 948.

If the –n equals exactly to a power of two minus 100 then the lasttask will increments continuously.

If the –n equals to a power of two minus 99 then Progress will use only a first half of the pool and the half of TRIDs will be never used. Does it affect the performance?

As the result the increments of the lasttask can be twice higher than the number of transactions allocated in Transaction Table. But we can calculate the real number of transactions by the formula:

 The number of transactions = (lasttask2 – lasttask1)

– (truncate(lasttask2/X, 0) - truncate(lasttask1/X, 0)) * (X- -n – 100)

where X is the first power of two greater or equal to (-n + 100)

provided the –n was not changed between lasttask1 and lasttask2.

 

The difference between the number of transactions and the number of the committed transactions is the undos of the whole transactions. The _Summary-Undos counter includes this number plus the undos of sub-transactions.

Since V10.0A Progress internally supports another two useful counters related to the transaction activity but unfortunately they are not available in promon or VSTs. They are available in Master Block only:

mb_txnSequence (size: 8 bytes, offset: 240 or 236 bytes on Linux)

mb_lastLogOpSeq (size: 8 bytes, offset: 248 or 244 bytes on Linux)

Note that the field offsets in Master Block on Linux is different from the ones on other platforms. By the ways, it’s one (but not the only) reason why we can’t copy Progress databases between Windows and Linux. Also Master Block is flushed on disk only during rather rare events like checkpoints, switching after-image files etc. Usually it happens only once per few minutes. We can read these values from disk but they will not be up-to-date.

 

mb_txnSequence

It counts only active transactions – the transactions that did changed a database. The only similar counter in VSTs is _Connect-NumTrans.

In the “young” databases the value of mb_lasttask is always higher than mb_txnSequence. The ratio is opposite in the “matured” databases where mb_lasttask is rolled over. The mb_txnSequence devided by 2147483648 estimates how many times the mb_lasttask was rolled over in your database.

 

By the ways, some activity screens in promon (for example, Activity: BI Log) have the “Per Tx” columns. The values are based on _Summary-Commits. In other words, the counters of db changes are divided by the number of transactions that includes the transaction in the “BEGIN” status. It makes “Per Tx” statistics meaningless.

 

mb_lastLogOpSeq

It counts the "parentheses" (RL_LOGOP_START and RL_LOGOP_END notes) in BI file that enclose the physical operations that compose a logical operation. They are generated only when the database is the source for replication. Ratio of mb_lastLogOpSeq to mb_txnSequence is a rough estimation of average transaction size. Dbrpr utility will not show non-zero values in mb_lastLogOpSeq because it will request to turn off after-image logging. Instead of dbrpr the viewdbblock script can be used.

  

To complete the list of the “transaction” fields in database blocks:

An extended header of the first block in data cluster stores the transactionId (size: 4 bytes, offset: 64 bytes). I guess it’s filled based on mb_lasttask.

  

_Trans._Trans-Num

_Connect._Connect-TransId (since V8.2)

_Lock._Lock-TransId (introduced in 11.4)

DBTASKID()

They return the transaction ID.

If _Connect is in use but does not open a transaction then _Connect-TransId returns 0.

DBTASKID() returns data stored in the client’s memory rather than in database shared memory.

  

_Connect._Connect-NumTrans (introduced in 11.7.0)

The counter is incremented only when transaction status is changing to “ACTIVE”.

This contradicts to "What's new with the VSTs" presentation:

“# transaction begin requests. Includes commit, rollback, and txns with no changes” 

Promon/ Activity: Summary => Active trans

It’s the number of entries in Transaction Table that are currently in use, the transaction status does not matter: it’s not only ACTIVE transactions. There is no such field in VST. The corresponding value can be calculated by counting the records in _Trans where _Trans-State NE ?

The value can be used for a rough estimation of the average transaction duration as a ratio of “Active trans” to “Commits per sec”. It will work even if transaction durations is a small part of a second.

 

_ActBILog._BiLog-RecRead

_ActBILog._BiLog-RecWriten (both exist since 8.2)

_UserIO._UserIO-BIRecReads

_UserIO._UserIO-BIRecWrites

_Trans._Trans-BIRecReads

_Trans._Trans-BIRecWrites (all introduced in 11.7.0)

 

The number of recovery notes read from/written to BI file.

Before V10.1A that introduced the savepoints the undos of sub-transactions did not read BI file. The -nosavepoint startup option on client processes restores the pre-10.1A local before-image mechanism.

Progress uses so called “the jump note technology” (see Article 000042774) that rules the reads of recovery notes during rollbacks but the detailed are unknown.

In the most cases the transaction rollbacks read and write almost the same number of notes that was created before beginning of undo but there are the scenarios when the behavior is quite different.

There is a feature of the _Trans-BIRecReads and _Trans-BIRecWrites fields: the counters should be zeros when a transaction is in the BEGIN state but you may see the non-zero values. It’s the final values left by the previous transaction of the same user. _Trans-BIRecWrites will count even the RL_TEND note created by the previous transaction. The counters will be zeroed when the current transaction becomes “ACTIVE” and the _Connect-NumTrans is incremented. I’m not sure if it’s an intended behavior but I’d not call it a bug either. We’ll get the correct values for the ACTIVE transaction.

I have some statistics that makes me think that those counters in _Trans table can be zeroed under some other (unknown) circumstances.

 

_Trans._Trans-State (since V8.2)

_Lock-Trans-State (introduced in 11.4)

 

Before 8.2: DEAD, ALLOCATED, ACTIVE, PREPARING, PREPARED, COMMITTING

Since 11.4: NONE, BEGIN, ACTIVE, PREPARING. PREPARED, COMMITTING

Promon: None, Begin, Active, Prep, Phase 1, Phase 2

 

NONE = DEAD

It means that the transaction table entry previously held is now free and available for reuse by a new transaction. This state can be seen only for a very short period of time.

BEGIN = ALLOCATED

The entry in transaction table is created but the transaction did not yet update a database.

PREPARING, PREPARED, COMMITTING

They are intended for use with Two Phase commit protocol but for a short period of time these states can be seen even if the protocol is not enabled.

 

Example from ATM test running 150 sessions on the top hardware (OpenEdge Release 11.7.2):

01/22/18        Activity: Summary
08:32:19        01/22/18 08:32 to 01/22/18 08:32 (10 sec)

Event                  Total  Per Sec |Event                  Total  Per Sec

Commits               37574    3757.4 |DB Reads             156775   15677.5
Undos                     0       0.0 |DB Writes             53437    5343.7
Record Reads         112596   11259.6 |BI Reads                  0       0.0
Record Updates       112094   11209.4 |BI Writes              1730     173.0
Record Creates        37346    3734.6 |AI Writes                 0       0.0
Record Deletes            0       0.0 |Checkpoints               0       0.0
Record Locks         187442   18744.2 |Flushed at chkpt          0       0.0
Record Waits             15       1.5 |Active trans            150

0 Servers, 151 Users (151 Local, 0 Remote, 151 Batch), 8 Apws

08:32:19       01/22/18 08:32 to 01/22/18 08:32 (10 sec)

 

Typical counts of transactions per their status on Status: Active Transactions screen:

121 Phase 2
 16 Active
 12 Begin
  1 None

There were no transactions in the PREPARING or PREPARED (Phase 1) states.

In other words, when the transactions are short (in this case the average duration was 40 msec = 150 / 3757.4) then the ACTIVE state of transaction can be much shorter than the COMMITTING state.

 

_Trans._Trans-Flags (introduced in 11.0)

_Lock-Trans-Flags (introduced in 11.4)

 

Values since 11.4: FWD UNDO

They return the UNDO flag only when a session was externally interrupted (by signal). Undo of transactions (or sub-transactions) initiated by ABL code is reported with the “FWD” flag even if _Trans-BIRecReads has non-zero value.

 

_Connect._Connect-Resync

It returns 1 when a session is externally interrupted.

If a session updated a few databases inside the same transaction then _Connect-Resync will be set only for one database at time. Session will write to a log of the corresponding database:

(2252) Begin transaction backout.

(2253) Transaction backout completed.

Transaction rollbacks will be done per database level: the last connected – the first undone.

 

_Connect. _Connect-Interrupt

It’s not used.

 

To be continued…

All Replies

Posted by gus bjorklund on 27-Mar-2019 17:20

Hi george. lots of good stuff in you article.

i have a few random things to add.

0) the transaction id is a 32-bit signed integer that increases monotonically. when it becomes negative because it has rolled over, it is reset to 1. not all possible transaction id values can be used. transaction id’s have to be positive because we used the negative of the transaction id as a lock in deleted unique indexes and for deleted records to lock the rowid until transaction end.

the mb_txnSequence counter is there to enable construction of unique 64-bit transaction id’s from it and the transaction id.

1) allocated or active transactions have two numbers that are related but different: the transaction id and the transaction table entry or slot number. the entry number can be derived from the transaction id.

2) the size of the transaction table is derived from -n + -Mn + 2. (one entry per server, one for the primary broker, and one reserved for proshut). that number is the size of the connection table and the maximum allowed number of concurrent transactions. the transaction table size is rounded up to the next larger power of two but the extra entries are not used.

(when crash recovery or warmstart is completed, the table size changes from the one used for crash recovery to the one that will be used for forward processing).

3) when a new transaction id is to be generated, we start with the last transaction number in the master block and add one. a transaction table table entry number is then obtained by taking the low order bits of the transaction id (anded with table size -1). if the entry is in use by another transaction, then we try the next larger value until we get a slot number larger than the maximum number of concurrent transactions, after which we jump forward to the next transaction id that matches the first transaction table entry. lather, rinse, repeat as needed.

4) when two-phase commit is in effect, there are two transaction tables, the regular one and another for transaction that have been prepared.

5) the “per tx” numbers in promon are somewhat inaccurate but i would not characterize them as meaningless. i have found them useful at times. same as when the speedometer in your car says 150 kph but we know it is not true because they all read a higher than the true value.

6) the “jump notes” are used during rollback to skip over bi blocks that do not contain notes written by the transaction being rolled back.

7) you are right that dbtaskid() returns data from the client’s memory, except the first time it is called in a new transaction. in that case, the client asks the database for it.

Posted by George Potemkin on 27-Mar-2019 17:53

Gus,

> the “per tx” numbers in promon are somewhat inaccurate but i would not characterize them as meaningless.

Our customers are splitting the table space used by their application into a few databases (for example, one of the customers stores data in 16 databases). When the code inisde transaction block reads data in one of these databases Progress opens a transaction here even if the code will not update the database. With data speaded between many databases it's highly likely that the most tranactions will never become ACTIVE. IMHO, the 'Commits' counter does not have too much sense when it by 90% consists of transactions in BEGIN status. Of course, when all data are stored in one database then the ratio between BEGIN and ACTIVE transactions can be opposite. But yes, I said about the meaningless of the “per tx” numbers stronger than I should. Sorry.

Posted by George Potemkin on 28-Mar-2019 10:18

> 6) the “jump notes” are used during rollback to skip over bi blocks that do not contain notes written by the transaction being rolled back.

I understand that recovery notes of the same transaction are backward linked: each note knows the location in BI file for its predecessor. And, of course, sometimes it’s not enough to undo a transaction.

Are the “jump notes” related to the following scenario?

10:29:31: Transaction created new record and its three index keys (really it created two records but I cut the information because it does not matter):

Trid: 1002166296 Fri Feb 22 10:29:31 2019. (2598)
Trid: 1002166296 User Id: XXXXXXX (12531)
Trid: 1002166296 code = RL_TBGN version = 1 (12528)
Trid: 1002166296 dbkey = 0  update counter = 0 (12530)
Trid: 1002166296 code = RL_LOGOP_START version = 1 (12528)
Trid: 1002166296 dbkey = 0  update counter = 0 (12530)
Trid: 1002166296 code = RL_RMCR version = 2 (12528)
Trid: 1002166296 area = 7   dbkey = 42707187456   update counter = 32 (12529)
Trid: 1002166296 code = RL_LOGOP_END version = 1 (12528)
Trid: 1002166296 dbkey = 0  update counter = 0 (12530)
Trid: 1002166296 code = RL_CXINS version = 2 (12528)
Trid: 1002166296 area = 8   dbkey = 342214230   update counter = 1039 (12529)
Trid: 1002166296 code = RL_CXINS version = 2 (12528)
Trid: 1002166296 area = 8   dbkey = 289733725   update counter = 8530 (12529)
Trid: 1002166296 code = RL_CXINS version = 2 (12528)
Trid: 1002166296 area = 8   dbkey = 44460085   update counter = 550 (12529)
Trid: 1002166296 code = RL_CXINS version = 2 (12528)
Trid: 1002166296 area = 8   dbkey = 342214129   update counter = 8798 (12529)

14:29:28: Session received HANGUP signal.

16:09:16: Transaction was INSTANTLY undone. Why the transaction was undone only since one hour 40 minutes after HANGUP is the different story.

[2019/02/22@16:09:16.787+0300] P-39454604   T-1     I ABL   939: (2252)  Begin transaction backout.
[2019/02/22@16:09:16.847+0300] P-39454604   T-1     I ABL   939: (2253)  Transaction backout completed.
Trid: 1002166296 code = RL_CXREM version = 2 (12528)
Trid: 1002166296 area = 8   dbkey = 342214285   update counter = 11500 (12529)
Trid: 1002166296 code = RL_CXREM version = 2 (12528)
Trid: 1002166296 area = 8   dbkey = 44460085   update counter = 551 (12529)
Trid: 1002166296 code = RL_CXREM version = 2 (12528)
Trid: 1002166296 area = 8   dbkey = 289733725   update counter = 8537 (12529)
Trid: 1002166296 code = RL_CXREM version = 2 (12528)
Trid: 1002166296 area = 8   dbkey = 342214261   update counter = 1995 (12529)
Trid: 1002166296 code = RL_LOGOP_START version = 1 (12528)
Trid: 1002166296 dbkey = 0  update counter = 0 (12530)
Trid: 1002166296 code = RL_RMDEL version = 2 (12528)
Trid: 1002166296 area = 7   dbkey = 42707187456   update counter = 65 (12529)
Trid: 1002166296 code = RL_RMCR version = 2 (12528)
Trid: 1002166296 area = 7   dbkey = 42707187456   update counter = 66 (12529)
Trid: 1002166296 code = RL_BKBBOT version = 1 (12528)
Trid: 1002166296 area = 7   dbkey = 42707187456   update counter = 67 (12529)
Trid: 1002166296 code = RL_BKMBOT version = 2 (12528)
Trid: 1002166296 area = 7   dbkey = 32768   update counter = -2009448691 (12529)
Trid: 1002166296 code = RL_BK2EB version = 2 (12528)
Trid: 1002166296 area = 7   dbkey = 42720444352   update counter = 73 (12529)
Trid: 1002166296 code = RL_LOGOP_END version = 1 (12528)
Trid: 1002166296 dbkey = 0  update counter = 0 (12530)
Trid: 1002166296 Fri Feb 22 16:09:16 2019. (2598)
Trid: 1002166296 code = RL_TEND version = 1 (12528)
Trid: 1002166296 dbkey = 0  update counter = 0 (12530)

During rollback the recovery notes are used in the reverse order. So the first note at the UNDO phase removed the last index key created at the FWD phase. But last RL_CXINS note created the index key in dbkey = 342214129 while RL_CXREM note removed the key from dbkey = 342214285. The reason is the index block split that happened at 10:29:59 – 28 seconds later after FWD phases of the transaction and 5 hours 40 minutes before the beginning of its backout:

Trid: 1002168778 code = RL_CXSP1 version = 1 (12528)
Trid: 1002168778 area = 8   dbkey = 342214129   update counter = 14398 (12529)
Trid: 1002168778 code = RL_CXSP2 version = 1 (12528)
Trid: 1002168778 area = 8   dbkey = 342214285   update counter = 2 (12529)

Other transactions wrote 10.7 GB (approximately 97 million notes) to BI file between the index block split and the beginning of transaction backout.

Of course, the notes of Trid 1002166296 did not know the dbkey of new index block where the index key was moved. What is the magic trick was used to find RL_CXSP2 note among almost 100 million notes just in 60 milliseconds (16:09:16.787 - 16:09:16.847) ? Is it “the jump note technology”?

Posted by George Potemkin on 28-Mar-2019 11:55

> 1) allocated or active transactions have two numbers that are related but different: the transaction id and the transaction table entry or slot number. the entry number can be derived from the transaction id.

For example, _Connect-Id always equals _Connect-Usr + 1 but the correlation between _Trans-Id and _Trans-Num is not obvious. Does _Trans-Id have any meaning that can be used to read the _Trans VST faster?

Posted by Dapeng Wu on 28-Mar-2019 14:16

George,

The "jump note" may or may not used in this case.

Jump note technology is only used when a transaction's current note is in a different BI block with its previous note. When this happens, the current note will record the location of the previous one in itself, somewhere that is not visible from this note scan. So I can't tell if any of these 8 notes have jump information or not. But given this transaction only started 8 notes, it's more likely they were all written in one BI block, which means no jump.

The reason that during roll back, CXINS of dbkey 342214129  becomes CXREM of 342214285  is due to the split, as you have mentioned. What happened is the transaction manager reads CXINS, it will back out the action logically, which means it will not go to the physical location of the index entry (dbkey 342214129) and remove the entry over there. But instead, it will search the entry in the B tree and remove the entry logically (the entry had been moved to dbkey 342214285). So I'll give the credit of this fast roll back to the index manager instead of jump note.

Jump note tech can be very helpful during roll back when there are a lot of concurrent transactions. For example, a transaction wrote a note in BI block 1, when it's about to write another note, it found out that BI block 1 or even 2.. are already filled out, it the had to use BI block 3. Without the jump note technology, rolling back of this transaction will have to scan all three blocks instead of just jumping back to block 1.

BTW, you have a very nice post here. I'll forward the link to our team members and I believe they will learn a lot too.

Posted by George Potemkin on 28-Mar-2019 14:29

> But instead, it will search the entry in the B tree and remove the entry logically (the entry had been moved to dbkey 342214285).

What's the elegant and simple solution!

Posted by George Potemkin on 29-Mar-2019 08:27

One more question about the jump note technologies: what is the expected ratio between the note reads and the note writes in case of transaction’s undo?

V10.2B08 fixed the defect PSC00260992 related to the jump notes:
Performance degradation with 100% busy MTX latch
knowledgebase.progress.com/.../000042774

Before the fix sometimes (a few times per day) we were getting the following activity in BI file:

06/07/13        Activity: BI Log
12:53:50        06/07/13 12:53 to 06/07/13 12:53 (2 sec)
                                    Total         Per Min          Per Sec          Per Tx
Total BI writes                         1              30             0.50            1.00
BIW BI writes                           1              30             0.50            1.00
Records written                        30             900            15.00           30.00
Bytes written                        3998          119940          1999.00         3998.00
Total BI Reads                        287            8610           143.50          287.00
Records read                        36483         1094490         18241.50        36483.00
Bytes read                        4636911          135847K      2318455.50      4636911.00
Clusters closed                         0               0             0.00            0.00
Busy buffer waits                       0               0             0.00            0.00
Empty buffer waits                      0               0             0.00            0.00
Log force waits                         0               0             0.00            0.00
Log force writes                        0               0             0.00            0.00
Partial writes                          0               0             0.00            0.00
Input buffer hits                      11             330             5.50           11.00
Output buffer hits                      0               0             0.00            0.00
Mod buffer hits                         0               0             0.00            0.00
BO buffer hits                         57            1710            28.50           57.00

Note reads were a thousand times larger than note writes if/when transaction's undo was affected by the defect.

Example of undo of large transaction with the recent Progress version (OpenEdge Release 11.7.2):

Time       BI RReads   BI RWrites  State  Flags  RReads/sec RWrites/sec
11:54:48           0            0  TBGN    FWD   
12:01:02           0    1,420,738  Active  FWD        0.0    3,798.8
12:31:05           0    8,782,310  Active  FWD        0.0    4,087.5
13:01:07           0   13,076,270  Active  FWD        0.0    2,384.2
13:31:09           0   21,415,904  Active  FWD        0.0    4,630.6
14:01:11           0   35,583,963  Active  FWD        0.0    7,866.8
14:29:28 HANGUP signal received.
14:31:13   2,607,605   44,419,628  Active  UNDO   1,447.9    4,906.0
15:01:14  11,204,695   60,778,434  Active  UNDO   4,773.5    9,083.2
15:31:16  17,365,329   72,351,940  Active  UNDO   3,420.7    6,426.2
16:01:17  33,414,993   94,891,840  Active  UNDO   8,911.5   12,515.2
16:31:19  46,070,338  123,329,803  Active  UNDO   7,026.8   15,790.1
17:01:20  51,805,489  137,095,839  Active  UNDO   3,184.4    7,643.6
17:31:22  64,467,997  158,531,609  Active  UNDO   7,030.8   11,902.1
18:01:24  71,393,141  170,409,133  Active  UNDO   3,845.2    6,595.0
18:31:26  80,826,268  187,387,152  Active  UNDO   5,237.7    9,427.0
19:01:27  86,351,644  197,161,683  Active  UNDO   3,067.9    5,427.3
19:31:28  97,928,534  213,323,807  Active  UNDO   6,428.0    8,974.0
20:01:30 102,544,477  219,782,879  Active  UNDO   2,563.0    3,586.4
20:11:07           ?  222,159,114  TEND    UNDO         ?    4,118.3

During UNDO phase the note writes were 1.5-2 times higher than the reads.

Total number of writes during UNDO phase is 4 times higher than the total number of writes during FWD phase.
Can it be explained by the jump note technologies?

When I run the tests without the concurrent transactions (without the jump notes) the results were quite expected:
Note Reads (during UNDO) = Note Writes during UNDO = Note Writes during FWD

The formula means that Note Reads = Total Writes / 2.
It seems to be true for the example above. Is it just a coincidence?

Duration of FWD phase: 9280 sec = 02:34:40
Duration of UNDO phase: 20499 sec = 05:41:39

Is it possible to predict the duration of UNDO phase / the activity during UNDO phase?

UPDATE: I understand that the "mirrored" notes (the notes that generated during undo phase after the reading a note from the FWD phase: RL_RMCR => RL_RMDEL+RL_RMCR, RL_BKREPL => RL_CXREM+RL_CXINS etc) are not in 1:1 ratio but the myth that BI size can double during undo has the reasons for existence. Though not in the example above.

Posted by gus bjorklund on 30-Mar-2019 16:30

answers to your questions

0) the way logical undo works during normal processing in multi-user mode (crash recovery is a bit different. so is single user mode) is as follows:

we start from the location of the last bi note written by the transaction. this information is in the connection table entry. the location of the first note (the transaction begin) is in the transaction table. from the lat note, we read backwards, undoing each logical operation and skipping over notes produced by other transactions and physical operations like index block splits. as we do this, we may find jump notes. these notes are produced during normal processing when the last note written is in an earlier block than the next note generated. so the jump notes link together /bi blocks/ produced by the transaction. the individual notes are not linked together. as we go backward undoing, when we eventually come upon the transaction begin note, we are done.

1) the ratio of bi record reads and bi record writes during undo can vary quite a bit and is complicated, depending on the operation being undone and what happened when it was originally performed. consider creating a record. when that happens, we allocate space from the rm chain. this could produce quite a few notes if many blocks are removed from the chain or moved to the back. each of those operations requires a bi note. once we find an appropriate block with space, we might delete obsolete rowid placeholders before we put the new record there. while all this is going on, other transactions may be doing stuff so the individual notes for our transaction may be far apart.

to undo this create operation, we delete the created record, which requires one note (unless it is fragmented) but we do not undo all the space allocation operations. however, we might now put the block on the rm chain which produces at least two more notes.

similarly, when we insert a new index entry, we might need to do a block split, some space allocation operations, move some existing entries, insert a new entry in the parent block and so on. to undo, we delete the new index entry and sometimes insert a "unique entry lock" to reserve the spot until transaction commit. but we do not undo the block split if one happened.

2) it is probable that the bi record read counter includes the bi notes generated by other transactions even though these are skipped. after all, we have to look at them to see if the transaction id matches.

3) i suppose you could make a rough estimate of projected transaction undo time based on the number of notes the transaction generated during forward processing if you had enough sample history for the specific system and application but it would be muchly affected by what the other transactions were doing during the rollback.

4) i do not think there is any way to use the transaction id to read the transaction table faster. the transaction table is a pretty simple one-dimensional array of entries and has no index. the low-order bits of the transaction id give the entry number.

5) yes, the "magic trick" used to find the RL_CXSP2 note is indeed the jump note. these were put in at the same time as the before-image clustering mechanism (i.e. version 4.2A). but undoing the insert requires a search of the index much as is done during normal processing index lookups. this is because the index entry may not be in the same block into which it was inserted. furthermore, the block split may have been performed by another transaction which has been committed.

6) you bring up a good point about the transaction being started in databases that are not updated when a c transaction is started in one of several connected databases. this is an unintentional design flaw that was introduced in version 6.2. none of us noticed it until much later. i first noticed it during version 9 development. the worst part about it is that it can cause unnecessary two-phase commit processing. it is probably not simple to fix either.

-gus

Posted by Dapeng Wu on 30-Mar-2019 22:28

Gus,

As to 5): the "magic" trick to find RL_CXSP2, I don't think it's due to the jump note. Jump notes are transaction based, which means when a transaction rolls back, it only cares about the notes written by itself. In this case, RL_CXSP2 was written by trid 1002168778, but the transaction rolled back was 1002166296 which only had 8 notes and that's all that transaction needs to take care of. The reason why trid 1002166296 can find the new location of an index entry is done by an index B-tree lookup.

In addition, RL_CXSP2 is a physical note, which will be just ignored during online transaction roll back. So even if trid 1002168778 rolls back, you won't find any compensation note of a RL_CXSP2 in its transaction undo note trace.

Posted by George Potemkin on 31-Mar-2019 08:25

> 1) the ratio of bi record reads and bi record writes during undo can vary quite a bit and is complicated, depending on the operation being undone and what happened when it was originally performed. consider creating a record. when that happens, we allocate space from the rm chain.

We need to estimate an undo time of the large transactions. They will often create a few records in the same blocks (about RPB). Hence a block will be removed from RM chain only once per many RMCR notes.

While creating a record Progress will also create its index keys – most likely for more than one index. In the example above dbkey 342214129 was split when the update counter was 14398, in other words the index split happened after the thousands of CXINS notes.

Yes, there are the extra notes that are generated apart of the “mainstream” notes (RMCR and CXINS notes in case of creating the bunch of records) and it’s not possible to predict those extra notes but the events when they are generated are relatively rare.

> 3) i suppose you could make a rough estimate of projected transaction undo time based on the number of notes the transaction generated during forward processing if you had enough sample history for the specific system and application but it would be muchly affected by what the other transactions were doing during the rollback.

I got the "Status: Active Transactions" screens from the customer:

9 different branches;
18 separate databases per branch;
4 days of monitoring
30,644 screens in total

Then I selected the transactions in BEGIN status and with non-zero BI RReads.

Like the following one:

  Usr:Ten   Name      Domain     Type       Login time     Tx start time  Trans id  BI RReads BI RWrites Trans State
 4516       XXXXXXXX      -4     REMC/ABL   02/27/19 13:28 -              819303242      3748       9246 Begin   FWD

As I wrote in the first post the values of RReads/RWrites are left by previous transaction of the same user. I got 4192 such transactions. 1021 of them were rather large (RWrites ge 100).

Ratio of RWrites to RReads:

RReads  RWrites Writes /Reads
3748    9246    2.47
1888    4917    2.60
1934    4383    2.27
1660    4331    2.61
1490    4014    2.69
1654    3770    2.28
...
  40     104    2.55
  41     102    2.44

Statistics for the ratio:

Min:    2.06
Max:   38.36
Average 2.48

The exceptions seem to be an undo of sub-transactions:

RReads  RWrites Reads/Writes
25      961     38.36
57      576     10.07
18      523     28.94
31      293      9.39

So I would use the following rules to estimate a transaction undo time:
Note reads during UNDO = Note writes during FWD and UNDO / 2.5
Note reads during UNDO = Note writes during FWD

As result:
Note writes during UNDO = 1.5 * Note writes during FWD

I don’t need an accurate estimation of transaction undo time. 50% would be enough. Anyways we can’t do anything with undo but only wait.

Posted by George Potemkin on 31-Mar-2019 14:59

Chapter 2: Types of UNDOs

Posted by Gus Bjorklund on 15 May 2018
community.progress.com/.../119631
<quote>
Remember that transactions are an /error handling/ mechanism. There are really four variants of undo in the client (plus two in the database, but that is another topic). they are: 0) undo triggered by an error condition detected in application logic and execution of an UNDO statement, 1) undo triggered by some error condition like a duplicate key, 2) undo triggered by a stop condition, 3) undo and termination triggered by an error detected by a signal handler.

The LBI file is used to record client-side history so that some or all actions can be undone. In all cases, there may or may not be LBI reading to accompany the undo action. The LBI file is composed of one or more nested “AR units” or “action-recovery units”. An AR unit matches execution of a block in the 4GL. One AR unit may align with a block that has a database transaction and that does not have to be the outermost AR unit. Only one because database transactions cannot be nested in the 4GL. Subsequent AR units will be sub-transactions and their database actions may be recorded in the LBI file or not if database savepoints are being used by the client. Also, a nested AR unit may involve the start of a transaction on another database. When that happens, the second database transaction “joins” the outer one in a sense. While actions on the second database can be undone, that transaction can’t be committed until the outer database transaction commits.

Along with block nesting, the LBI file is used to record values of local variables, temp table actions, database actions, buffer connections and disconnections and a few other items related to client side state information.

During forward processing, the LBI file is buffered in memory and written sequentially when the buffer fills. When an undo happens, the LBI is read backwards from eof and a new eof established when the undo is finished. Depending on how much is being undone, the new eof may be at the beginning of the LBI or at the start of some block’s AR unit.

When an undo happens, the undo may go back to any active AR unit. The target AR unit may be the same one that matches a database transaction, higher, or lower. In the first four cases mentioned above, the 4GL session will or can continue, depending on the application logic. The undo processing is very similar for all of them, except that a stop condition may be triggered by a lost database connection. In such a case, database actions in sub-transactions cannot be undone by the client. undo must include the AR unit for the outermost database transaction but may go farther than that but not does necessarily stop the session.

In the last case, the client session is going to be terminated due to a fatal error. Much of the undo processing is the same as above, except that it goes all the way to the outermost AR-unit and then the client will exit. In some cases, depending on the type of error (e.g. an access violation), undo processing consists only of rolling back the database transaction followed by database disconnect and process exit.

I’m sure I have left out something important but I cannot remember what it is.
</quote>

If a session has opened the transactions in a few databases then these transactions will be undone one by one, in other words, the transactions will be temporary "frozen" in all databases except one.

Signals can trigger an undo only of the whole transaction. ABL events can trigger an undo of the sub-transactions.

In case of remote client an undo is performed by remote server: client session sends one short message to its server. When undo is completed the server sends one message back to the client. The exception is the undo of sub-transaction initiated by a client’s session started with the -nosavepoint. In this case the session will send one message per each recovery note generated by server.

Article P114934: "Before the introduction of savepoints in OpenEdge 10.1A, the undoing of a subtransaction was handled and optimized via the local before-image file, which led to the net result that only just about as many notes would be written during transaction rollback as they were written during forward processing."

Undo of the whole transaction initiated by session with the -nosavepoint is performed the same ways as without the -nosavepoint - server will receive only two messages from a client and will send one message back.

When a client’s session is not using the -nosavepoint the RL_TMSAVE notes are generated also during undo phase.

Undo of transaction or sub-transaction can’t be interrupted. If a session will get a signal during UNDO phase the action will be postponed until UNDO is completed.

The signals-terminators (SIGINT, SIGQUIT, SIGPIPE, SIGTERM) received after the SIGHUP signal will be completely ignored because SIGHUP will be moved from SigCgt (the caught signals) to SigIgn (the ignored signals). On Linux, for example, you can see the signal masks in /proc/$PID/status file.
Other types of the signals should be processed even after SIGHUP.

If an undo of sub-transaction is successfully completed then its recovery notes will be ignored during undo of the whole transaction – as if they do not exist at all in BI file. Probably Progress writes a jump note at the end of sub-transaction’s undo. I guess it was the original aim of the jump notes.

The UNDO triggered by an event inside ABL session or by SIGINT, SIGPIPE signals (“stoppers”):
Transaction backouts will be done in the order in which a session has been connected to the databases: the first connected database will be the first one backed out.
Duration of undo will closely match the duration of FWD phase.

The UNDO triggered by the signals-terminators (SIGHUP, SIGQUIT, SIGTERM and proshut -C disconnect):
Session sets the "usertodie" flag in Usrctl Table (_Connect-Disconnect).
Transaction backouts will be done in the reverse order: the last connected database will be the first one backed out.
Session sets the "resyncing" flag (_Connect-Resync) only in database where undo is currently running.
When the "resyncing" flag is set ON/OFF the backout messages is written to a database log:
(2252) Begin transaction backout.
(2253) Transaction backout completed.
Duration of undo will be 25-50% of transaction duration.
Session will keep running for a while after logout from all databases. There is a case when it becomes a problem: if session got a fatal error during an undo the watchdog will not release the resources locked by session until the process still exists in the system.

Posted by gus bjorklund on 01-Apr-2019 16:02

> On Mar 30, 2019, at 6:30 PM, Dapeng Wu wrote:

>

> the "magic" trick to find RL_CXSP2, I don't think it's due to the jump note.

dapeng, you are correct. sorry for my sloppy and insufficient explanation.

the first transaction inserts a new index entry in index block 342214129. later, the second transaction does a block split, moving the inserted entry into index block 342214285, and also making a lot more other changes. then, still later, the first transaction makes more database changes, generating more notes. or maybe no other changes at all, and then is rolled back.

when the first transaction is rolled back, the rollback starts at the last note generated, reading backward and skipping over purely physical operations. the start point could be the insert of the new index entry before the split or at some later operation done by the first transaction. in the case of a later operation, there would be a jump note that skips over stuff done by the second transaction (assuming many bi blocks of other stuff intervenes as george describes), but maybe the jump would not go all the way back to the original index entry insertion.

anyhow, once the rollback gets back to the index entry insertion note, because it is a logical rollback, instead of undoing the operation on block 342214129 which is now the wrong one, it extracts the key from the note and does a regular index lookup to find the entry in the correct block, 342214285. note that during normal rollback, this lookup could run into conflicting db buffer locks, bi and ai buffer locks locks, and so on, and be delayed by the other transactions. furthermore, there could be more block splits, causing the entry to be moved still further.

so, while the jump note may not lead from the block split to the original index entry insert, in this scenario, there would certainly be one or more of them to enable the first transaction’s rollback to skip over the notes generated by the second. thus i suppose one could say the magic is a combination of jumpnote(s) and key lookup.

Posted by gus bjorklund on 01-Apr-2019 16:02

> On Mar 31, 2019, at 11:01 AM, George Potemkin wrote:

>

> if session got a fatal error during an undo the watchdog will not release the resources locked by session until the process still exists in the system.

i guess you mean until the process no longer exists. watchdog does nothing if process still exists.

Posted by George Potemkin on 01-Apr-2019 17:24

> i guess you mean until the process no longer exists. 

Yes, you're right.

Example:

[2018/11/16@13:02:29.393+0700] P-19354      T-139681520473920 I ABL   299: (2252)  Begin transaction backout. 
[2018/11/16@13:02:59.209+0700] P-19354      T-139681520473920 F ABL   299: (819)   SYSTEM ERROR: Error in undo of record delete 
[2018/11/16@13:02:59.209+0700] P-19354      T-139681520473920 I ABL   299: (439)   ** Save file named core for analysis by Progress Software Corporation. 
...
[2018/11/16@13:15:54.923+0700] P-15132      T-139805434283776 I WDOG   10: (2527)  Disconnecting dead user 299.

Watchdog tries to disconnect the user only 775.714 sec (= 00:12:55.714) after fatal error because user's process was alive during this time..

Posted by George Potemkin on 04-Apr-2019 12:34

I have investigated a bit father the large transaction above. As usually the answers raise new questions. Undo phase is now looking even more “mystical”.

First of all, the program that updated the database was run inside a block with transaction property. So in fact it was a sub-transaction from the session’s point of view. It’s an important note!

The transaction was very large (the size of its scan is 40 GB). So I traced the history only for 3 blocks updated by transaction:
A. the block updated at the beginning of the transaction. Really it’s a second block updated by transaction;
B. the block updated in the middle of the transaction. Really it’s a block updated just before the session received the HANGUP signal. It does not turn out to matter;
C. the block updated at the end of FWD phase;
O. The changes of object block will be mentioned as a part of the changes of RM chain.

During the FWD phase of transaction the blocks are updated in the following order: A -> B -> C.
During the UNDO of phase of (sub-) transaction the expected order: C -> B -> A.

The history of the block updates:

11:54:48   code = RL_TBGN =================================================================================

11:54:49 A code = RL_RMCR   area = 11   dbkey = 9986286912   update counter = 1 to 64 (64 updates)
11:54:49 O code = RL_BKFRM  area = 11   dbkey = 131072       update counter = 61007108
11:54:49 A code = RL_BKFRB  area = 11   dbkey = 9986286912   update counter = 65

12:01:36 B code = RL_RMCR   area = 11   dbkey = 9987806848   update counter = 1 to 64 (64 updates)
12:01:36 O code = RL_BKFRM  area = 11   dbkey = 131072       update counter = 61014341
12:01:36 B code = RL_BKFRB  area = 11   dbkey = 9987806848   update counter = 65

13:26:46 C code = RL_RMCR   area = 11   dbkey = 9976954368   update counter = 1 to 21 (21 updates)

13:26:46   Sub-transaction was interrupted (by END-KEY?) ==================================================

13:26:46 C code = RL_RMDEL  area = 11   dbkey = 9976954368   update counter = 22 to 62 by 2 (21 updates)
13:26:46 C code = RL_RMCR   area = 11   dbkey = 9976954368   update counter = 23 to 63 by 2 (21 updates)

14:11:11 B code = RL_RMDEL  area = 11   dbkey = 9987806848   update counter = 66, 69 to 193 by 2 (64 updates)
14:11:11 B code = RL_RMCR   area = 11   dbkey = 9987806848   update counter = 67, 70 to 194 by 2 (64 updates)
14:11:11 B code = RL_BKBBOT area = 11   dbkey = 9987806848   update counter = 68
14:11:11 Z code = RL_BKBBOT area = 11   dbkey = 9987806784   update counter = 68
14:11:11 O code = RL_BKMBOT area = 11   dbkey = 131072       update counter = 61070135
14:11:11 B code = RL_BK2EB  area = 11   dbkey = 9987806848   update counter = 195

14:22:05 A code = RL_RMDEL  area = 11   dbkey = 9986286912   update counter = 66, 69 to 189 by 2 (62 updates)
14:22:05 A code = RL_RMCR   area = 11   dbkey = 9986286912   update counter = 67, 70 to 190 by 2 (62 updates)
14:22:05 A code = RL_BKBBOT area = 11   dbkey = 9986286912   update counter = 68
14:22:05 A code = RL_BK2EB  area = 11   dbkey = 9986286912   update counter = 191

14:27:02 A code = RL_RMDEL  area = 11   dbkey = 9986286912   update counter = 192 to 314 by 2 (62 updates)
14:27:02 A code = RL_RMCR   area = 11   dbkey = 9986286912   update counter = 193 to 315 by 2 (62 updates)

14:29:27 B code = RL_RMDEL  area = 11   dbkey = 9987806848   update counter = 196 to 322 by 2 (64 updates)
14:29:27 B code = RL_RMCR   area = 11   dbkey = 9987806848   update counter = 197 to 323 by 2 (64 updates)

14:29:28   HANGUP signal received (562) ===================================================================

15:33:20 C code = RL_RMDEL  area = 11   dbkey = 9976954368   update counter = 64 to 104 by 2 (21 updates)
15:33:20 C code = RL_RMCR   area = 11   dbkey = 9976954368   update counter = 65 to 105 by 2 (21 updates)

15:33:20 C code = RL_RMDEL  area = 11   dbkey = 9976954368   update counter = 106 to 146 by 2 (21 updates)
15:33:20 C code = RL_RMCR   area = 11   dbkey = 9976954368   update counter = 107 to 147 by 2 (21 updates)

16:02:48 B code = RL_RMDEL  area = 11   dbkey = 9987806848   update counter = 324 to 450 by 2 (64 updates)
16:02:48 B code = RL_RMCR   area = 11   dbkey = 9987806848   update counter = 325 to 451 by 2 (64 updates)

16:09:16 A code = RL_RMDEL  area = 11   dbkey = 9986286912   update counter = 316 to 438 by 2 (62 updates)
16:09:16 A code = RL_RMCR   area = 11   dbkey = 9986286912   update counter = 317 to 439 by 2 (62 updates)

16:09:16   Begin transaction backout. (2252) ==============================================================

16:09:16 A code = RL_RMDEL  area = 11   dbkey = 9986286912   update counter = 440 to 562 by 2 (62 updates)
16:09:16 A code = RL_RMCR   area = 11   dbkey = 9986286912   update counter = 441 to 563 by 2 (62 updates)

16:10:46 B code = RL_RMDEL  area = 11   dbkey = 9987806848   update counter = 452 to 578 by 2 (64 updates)
16:10:46 B code = RL_RMCR   area = 11   dbkey = 9987806848   update counter = 453 to 579 by 2 (64 updates)

17:09:39 C code = RL_RMDEL  area = 11   dbkey = 9976954368   update counter = 256 to 296 by 2 (21 updates)
17:09:39 C code = RL_RMCR   area = 11   dbkey = 9976954368   update counter = 257 to 297 by 2 (21 updates)

17:56:39 A code = RL_RMCHG  area = 11   dbkey = 9986286912   update counter = 564 to 625 by 2 (62 updates)

17:59:41 B code = RL_RMCHG  area = 11   dbkey = 9987806848   update counter = 580 to 643 (64 updates)

19:02:58 C code = RL_RMCHG  area = 11   dbkey = 9976954368   update counter = 298 to 318 by 2 (21 updates)

19:02:58 C code = RL_RMDEL  area = 11   dbkey = 9976954368   update counter = 319 to 359 by 2 (21 updates)
19:02:58 C code = RL_RMCR   area = 11   dbkey = 9976954368   update counter = 320 to 360 by 2 (21 updates)

20:02:18 B code = RL_RMDEL  area = 11   dbkey = 9987806848   update counter = 644 to 770 by 2 (64 updates)
20:02:18 B code = RL_RMCR   area = 11   dbkey = 9987806848   update counter = 645 to 771 by 2 (64 updates)

20:11:06 A code = RL_RMDEL  area = 11   dbkey = 9986286912   update counter = 626 to 748 by 2 (62 updates)
20:11:06 A code = RL_RMCR   area = 11   dbkey = 9986286912   update counter = 627 to 749 by 2 (62 updates)

20:11:07   code = RL_TEND =================================================================================

Initial interruption of sub-transaction was initiated by ABL code. As expected the transaction’s state stayed unchanged: FWD.

UNDO of sub-transaction was handled via the local before-image file: the session did not read BI file at this phase. It’s not expected because the session did not use the -nosavepoint parameter (transaction used the RL_TMSAVE notes).
Since 34:25 after interruption promon shown:

02/22/19        Status: Active Transactions by user number for all tenants
14:01:11
  Usr:Ten   Name      Domain     Type       Login time     Tx start time  Trans id  BI RReads BI RWrites Trans State
  950       XXXXXXXX       0     SELF/ABL   02/21/19 12:59 02/22/19 11:54 715548686         0   35583963 Active  FWD

What ABL events can initiate such type of UNDO?

During UNDO of sub-transaction the blocks were updated in the following order:
1. C -> B -> A -> 2. A -> B -> C -> 3. C -> B -> A
In other words it was:
1. UNDO of FWD phase
2. UNDO of UNDO # 1
3. UNDO of UNDO # 2
Why we got 3 phases of UNDO?

Transaction has created 64 records in the block A. During UNDO phases of sub-transaction as well as during UNDO of the whole transaction only 62 slots were processed in this block: the records were deleted and replaced by placeholders and vice versa. Why two slots were missed?

As expected the reaction on HANGUP signal was postponed until the end of sub-transaction UNDO.

UNDO of the whole transaction did not ignore the changes done during UNDO phase but it seems to be the result of using LBI file during UNDO phase. Otherwise the transaction would be completed immediately after the end of sub-transaction UNDO.

More over the order of changes are not mirroring the changes on the previous phases:
UNDO of the whole transaction updated the blocks in the order: A -> B -> C

Then unexpected we got the RL_RMCHG notes for each slot in the blocks in the order: A -> B -> C. Note that there were no RL_RMCHG notes on all preceding phases. What are they doing? And why the order of these changes is not reverse to the previous notes?

And finally the records were deleted and replaced by the placeholder in the expected order: C -> B -> A

Posted by gus bjorklund on 04-Apr-2019 13:30

simple.

when the 4GL undoes a subtransaction using the LBI file, the database knows nothing about it. to the datbase, it just looks like more forward processing.

block were updated a->b->c, followed by subtransaction undo of c->b->a so now the order of forward processing changes seen by the database is

a->b->c->c->b->a

then those changes are undone in reverse order

a->b->c->c->b->a

Posted by George Potemkin on 04-Apr-2019 13:37

> when the 4GL undoes a subtransaction using the LBI file

But UNDO uses the LBI file only when a session was started with the -nosavepoint option. At least my tests did not find other scenarios.

Posted by gus bjorklund on 04-Apr-2019 17:08

you are correct.

when the database rolls back a savepoint, it then forgets about it so it has the same effect as the lbi driven rollback. it could (in theory and in practice) use another kind of jump note to skip over all the do and undo operations in the savepoint but currently i don’t believe it does that.

Posted by George Potemkin on 04-Apr-2019 17:37

In my tests I used the UNDO statement to roll back a sub-transaction. UNDO of transaction read the notes left by the changes before and after the sub-transaction but totally skipped all notes related to the undone sub-transaction. It was the same way even in client-server mode.

Gus, as you wrote earlier there are many types of UNDOs. I guess in this case we got a type of undo that ignores the save points. Could it be possible? I don't know what was an event that triggered the undo.

This thread is closed