RHEL 5.9OE 64-bit 11.6.3.017I'm investigating a recent issue at a client site. Their BI grew until it hit the threshold, which at the time was 2 GB. I am analyzing aimage scan verbose output using George's code (thank you George!) and am uncertain how to interpret what I see.
I have identified three very long-running transactions (4328, 7665, and 2179 seconds) so I know why the BI grew. The next step is to understand what certain other users were doing in that time frame. I don't know whether it is normal activity or not. I'll be analyzing more AI files from the week prior to try to figure that out. Several of them were not doing the typical record/index CUD activity I would expect; they just had a ton of save points:
Trid Userid AiNum BegTime EndTime Len Notes SubTrans Notes/Sub NoteList
32560737 alice 73134 Fri Nov 17 09:25:26 2017 Fri Nov 17 10:37:34 2017 4328 6 2 3 (RL_TMSAVE,RL_RMCHG:7)*2
34745864 alice 73145 Fri Nov 17 12:07:12 2017 Fri Nov 17 12:12:38 2017 326 1728970 1728968 1 (RL_TMSAVE)*1728968
34756650 bob 73145 Fri Nov 17 12:11:07 2017 Fri Nov 17 12:15:18 2017 251 1172022 1172020 1 (RL_TMSAVE)*1172020
34766381 bob 73146 Fri Nov 17 12:32:41 2017 Fri Nov 17 12:36:11 2017 210 1086047 1086044 1 (RL_TMSAVE)*1086044,RL_LSTMOD:6
34769079 bob 73147 Fri Nov 17 12:38:17 2017 Fri Nov 17 14:46:02 2017 7665 20010227 20010222 1 (RL_TMSAVE)*20010222,RL_LSTMOD:6*3
34780858 carol 73148 Fri Nov 17 12:57:06 2017 Fri Nov 17 13:33:25 2017 2179 9239104 9239100 1 (RL_TMSAVE)*9239100,RL_LSTMOD:6*2
34838161 carol 73156 Fri Nov 17 16:17:33 2017 ? 743 4448476 4448475 1 (RL_TMSAVE)*4448475
Can someone help me to understand what these remote ABL users might have been doing to cause all these TMSAVE notes? Thanks.
The following program will generate only TMSAVE notes:
DO /* LARGE */ TRANSACTION:
DO vLoop = 1 TO vRepeats ON ERROR UNDO, LEAVE:
Interesting. Thanks George.
BTW, _TableStat/_UserTableStat will show the creates/deletes but _IndexStat/_UserIndexStat will not. In other words, you can use these statistics to identify a table name used in the create/delete statements => xref => the programs run by the clients.
> BTW, _TableStat/_UserTableStat will show the creates/deletes but _IndexStat/_UserIndexStat will not. In other words, you can use these statistics to identify a table name => the programs that wre run by clients.
Thanks, I'll file that away for next time. Unfortunately I was investigating after a shutdown and restart, so no VST data to look at.
Wow, based on your screenshots it looks like the problem developed over the course of 3.5 hours. Does that sound right? I sure wish OE had some sort of reasonable transaction timeout which could be enforced by the DBMS itself.
This problem happens to us regularly. There are a few factors that are typically involved. It normally stems from the interactive CHUI users who have open transactions that use pessimistic record locking. The CHUI user may go to lunch without remembering to close out of their OE database transaction. The OE dba then steps away for a few hours as well, and nobody gets alerted about BI growth, nor long-running transactions. Then the problem slowly escalates into a system-wide issue as other similar CHUI users, or batch processes start to become backlogged on the same locked records.
This problem is something we were hoping to be able to troubleshoot with OE Explorer's "oldest transaction age" property. I think the transaction age is even more important than the BI usage, since it is the initial indicator that something is going wrong, and you can start sending out alerts long before BI grows to 2 GB. (Assuming transactions shouldn't legitimately last for hours at a time.)
Unfortunately the "oldest transaction age" doesn't seem to be reported properly in OEE. I just asked about this earlier today: community.progress.com/.../36140
> Wow, based on your screenshots it looks like the problem developed over the course of 3.5 hours.
The anomalous activity started around noon and the DB stalled at 13:48. I haven't pinpointed yet exactly when the growth started, which wasn't necessarily when the first long-running transaction started.
> I sure wish OE had some sort of reasonable transaction timeout which could be enforced by the DBMS itself.
The problem with trying to design a generic platform solution is that everyone has their own idea of what is "reasonable".
In this case, the client was not monitoring BI growth or transaction age. I am having that conversation with them now.
Both are possible programmatically, even if OEE isn't working for you as you expect. You can look at _trans._trans-duration and _dbstatus._dbstatus-bisize.
Speaking of enhancements, it would be nice if there was the option of a DB log message each time the BI is extended (logically or physically).
Used to manage a system that had an issue with generating too many Sequences, we noticed it because of BI / AI activity and no matching table / index activity.