SYSTEM ERROR: Attempt to free buffer type 2

Posted by Valeriy Bashkatov on 02-Mar-2016 08:41

Hello,

HP-UX 11.31
10.2B0817

Anyone know what the meaning of this message in the database log? 

(1077)  SYSTEM ERROR: Attempt to free buffer type 2. 

I do not find anything about this error in ProKB.

Regards,
Valeriy

Posted by Dapeng Wu on 03-Mar-2016 08:20

The messages in the .lg file do look very strange or out of order. In the normal case, we should see (562) first, then (2252), then (2253). The mechanism to back out a transaction depends on the transaction entry (trid) in the transaction table in shared memory, but it looks the backout didn't do anything while online. And the real backout happened in the crash recovery phase, where the transaction table will be reestablished from the BI files, and this transaction table seems more reliable than the previous one.

Message (1077) cannot be explained either, it may be picked up from a wrong location.

Once we see (2256), the watch dog will set a FATAL flag in shared memory and the database will start an emergency shutdown process. In most cases, the messages following that are not reliable since we are in an emergency. So although the ABL process was already dead, the broker may still try to clean up something for it.

Put all these together, it looks like a memory stomp to me. And I won't rule out any OS issues either.

Dapeng

All Replies

Posted by George Potemkin on 02-Mar-2016 09:13

Its description says: "This message can occur when running multi-user PROGRESS on BTOS. It indicates that the client/server communications buffers have been corrupted".

I guess that only the second sentence is correct and the message exists since Progress V7.

Posted by Dapeng Wu on 02-Mar-2016 09:43

It means the network message buffer free chain may be corrupted, for example, an infinite loop was detected when trying to free a buffer and put it back on the chain.
Dapeng
 

Posted by George Potemkin on 02-Mar-2016 10:07

It was a self-service client:

[2016/03/02@17:42:23.123+0600] P-18310      T-1     I ABL  1786: (2252)  Begin transaction backout. 
[2016/03/02@17:42:23.123+0600] P-18310      T-2     I ABL  1786: (562)   HANGUP signal received. 
[2016/03/02@17:42:23.123+0600] P-18310      T-1     I ABL  1786: (2253)  Transaction backout completed. 
[2016/03/02@17:42:23.123+0600] P-18310      T-2     F ABL  1786: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2016/03/02@17:42:23.123+0600] P-18310      T-1     F ABL  1786: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2016/03/02@17:42:23.123+0600] P-18310      T-1     F ABL  1786: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2016/03/02@17:42:23.123+0600] P-18310      T-1     F ABL  1786: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2016/03/02@17:42:23.123+0600] P-18310      T-1     F ABL  1786: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2016/03/02@17:42:23.123+0600] P-18310      T-1     F ABL  1786: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2016/03/02@17:42:23.124+0600] P-18310      T-1     F ABL  1786: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2016/03/02@17:42:23.124+0600] P-18310      T-1     F ABL  1786: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2016/03/02@17:42:23.124+0600] P-18310      T-1     F ABL  1786: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
 
[2016/03/02@17:42:56.367+0600] P-25680      T-1     I WDOG 1317: (2256)  SYSTEM ERROR: User 1786 died during microtransaction. 

[2016/03/02@17:42:56.491+0600] P-22516      T-1     I BROKER  0: (-----) Sending signal 7 to user 1786

[2016/03/02@17:42:57.625+0600] P-22516      T-1     I BROKER  0: (2527)  Disconnecting dead user 1786. 

There are other strange things in the log:
1) Transaction backout begun /before/ HANGUP signal;
2)  /After/ transaction backout completed the client died during microtransaction.
3) Watchdog thought that user 1786 was dead but /later/ broker sent a signal to the user.

Posted by Valeriy Bashkatov on 03-Mar-2016 04:51

Thank you!

>>the network message buffer free chain

But it is a self-service client.

The problem is compounded by the fact that the database has been abnormally shutdown by broker after this error and the customer wants to understand why it happened and how to avoid it in the future (

What should I pay attention?

It is possible to say that it was not a failure in the OpenEdge, but it was a failure in the operating system or in the hardware?

Regards,
Valeriy

Posted by George Potemkin on 03-Mar-2016 05:43


The scans of ai files allow us to look at the past. ;-)
Here is what ai scan says in this case:

The error happened after 79 sec since transaction beginning. At this moment the transaction has created 743 notes.
Just before the errors the transaction did not update db during 15 seconds. Most likely an user was enterring some data from keyboard (at least the transaction also paused for 13 and 40 seconds before this moment).
Then the transaction has updated an existent record:

Trid: 1813582918 code = RL_LOGOP_START version = 1 (12528)
Trid: 1813582918 dbkey = 0 update counter = 0 (12530)
Trid: 1813582918 code = RL_RMCHG version = 2 (12528)
Trid: 1813582918 area = 7 dbkey = 5947509888 update counter = 430 (12529)
Trid: 1813582918 code = RL_LOGOP_END version = 1 (12528)


And then the transaction has removed two index keys of the record:

Trid: 1813582918 code = RL_CXREM version = 2 (12528)
Trid: 1813582918 area = 7 dbkey = 5944106112 update counter = 13375 (12529)
Trid: 1813582918 code = RL_CXREM version = 2 (12528)
Trid: 1813582918 area = 27 dbkey = 22753076 update counter = 427 (12529)


But according the previous patterns in the same transaction it should create three RL_CXREM notes.
I'm waiting from the customer for the block dumps to identify the index whose key the transaction failed to update.
Db log says that transaction backout completed immediately after backout's beginning but it's not true. The transaction was rolled back only during database restart.

So the errors happened while the session was trying to delete an index key.

Posted by Dapeng Wu on 03-Mar-2016 08:20

The messages in the .lg file do look very strange or out of order. In the normal case, we should see (562) first, then (2252), then (2253). The mechanism to back out a transaction depends on the transaction entry (trid) in the transaction table in shared memory, but it looks the backout didn't do anything while online. And the real backout happened in the crash recovery phase, where the transaction table will be reestablished from the BI files, and this transaction table seems more reliable than the previous one.

Message (1077) cannot be explained either, it may be picked up from a wrong location.

Once we see (2256), the watch dog will set a FATAL flag in shared memory and the database will start an emergency shutdown process. In most cases, the messages following that are not reliable since we are in an emergency. So although the ABL process was already dead, the broker may still try to clean up something for it.

Put all these together, it looks like a memory stomp to me. And I won't rule out any OS issues either.

Dapeng

Posted by George Potemkin on 03-Mar-2016 08:51

Thanks, Dapeng!

Posted by Dmitri Levin on 04-Mar-2016 16:36

May be messages out of order because (562) is from thread number two (T-2) while all other error messages are from T-1.

Posted by George Potemkin on 10-Mar-2016 06:08

Good eye, Dmitri!

I checked db log. The "T-2" threads were reported only for the following messages:

(562) HANGUP signal received.

(298) KILL signal received.

(4375) Received signal 7; handling as SIGHUP.

with an exeption for only one message (1077)  SYSTEM ERROR: Attempt to free buffer type 2.

Statistics per db log:

T-1: 12600 (90%) "signal received" messages;

T-2: 1334 (10%) "signal received" messages .

The same process can post "T-1" as well as "T-2":

[2016/03/02@09:19:11.296+0600] P-19686      T-2     I ABL  2493: (562)   HANGUP signal received.

[2016/03/02@09:19:11.391+0600] P-19686      T-1     I ABL  2493: (562)   HANGUP signal received.

[2016/03/02@14:53:33.734+0600] P-16203      T-2     I ABL  2477: (298)   KILL signal received.

[2016/03/02@14:53:33.849+0600] P-16203      T-1     I ABL  2477: (298)   KILL signal received.

It seems to be just a "feature".

Posted by George Potemkin on 20-Apr-2017 07:02

One year later. The same customer, the same error, the same environment.

[2017/04/20@10:51:22.171+0600] P-27948      T-1     I ABL  2680: (708)   Userid is now UXXXXX. 
[2017/04/20@11:42:26.341+0600] P-27948      T-2     I ABL  2680: (562)   HANGUP signal received. 
[2017/04/20@11:42:26.341+0600] P-27948      T-1     I ABL  2680: (2253)  Transaction backout completed. 
[2017/04/20@11:42:26.341+0600] P-27948      T-1     I ABL  2680: (2253)  Transaction backout completed. 
[2017/04/20@11:42:26.341+0600] P-27948      T-2     F ABL  2680: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2017/04/20@11:42:26.342+0600] P-27948      T-1     I ABL  2680: (2252)  Begin transaction backout. 
[2017/04/20@11:42:26.342+0600] P-27948      T-1     I ABL  2680: (2253)  Transaction backout completed. 
[2017/04/20@11:42:26.408+0600] P-27948      T-2     I ABL  2680: (453)   Logout by UXXXXX on /dev/pts/2093. 
[2017/04/20@11:42:26.408+0600] P-27948      T-2     I ABL  2680: (5029)  SYSTEM ERROR: Releasing multiplexed latch. latchId: 35247169 
[2017/04/20@11:42:26.408+0600] P-27948      T-2     I ABL  2680: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2017/04/20@11:42:26.409+0600] P-27948      T-2     I ABL  2680: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2017/04/20@11:42:26.409+0600] P-27948      T-2     I ABL  2680: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2017/04/20@11:42:26.409+0600] P-27948      T-2     I ABL  2680: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2017/04/20@11:42:26.409+0600] P-27948      T-2     I ABL  2680: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2017/04/20@11:42:26.409+0600] P-27948      T-2     I ABL  2680: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2017/04/20@11:42:26.409+0600] P-27948      T-2     I ABL  2680: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2017/04/20@11:42:26.409+0600] P-27948      T-2     I ABL  2680: (1077)  SYSTEM ERROR: Attempt to free buffer type 2. 
[2017/04/20@11:42:29.265+0600] P-25767      T-1     I BROKER  0: (-----) Sending signal 7 to user 2680
[2017/04/20@11:42:38.414+0600] P-25767      T-1     I BROKER  0: (2527)  Disconnecting dead user 2680.

The user’s session had an allocated transaction:

04/20/17 Status: Active Transactions
11:42:03
Usr Name Type Login time Tx start time Trans id Trans State
2680 UXXXXX SELF/ABL 04/20/17 10:51 - 1860882829 Begin

AI scan confirmed that the session did not have an active transaction.

latchId 35247169 is LKT latch family. IOW, the session used a lock table /after/ logout message.

Session tried to create protrace file but the file contains only a few words: "PROGRESS stack trace as of ". That’s all.

Any ideas what can cause the error?

Would it be it be a good idea to crash the session with protrace file after first error “Attempt to free buffer type 2” next time when it will happen again?

Posted by George Potemkin on 20-Apr-2018 09:23

The funny feature of this error - it happens once per year, only in March or April. 

2016 (10.2B0817):

[2016/03/02@17:42:23.123+0600] P-18310      T-2     F ABL  1786: (1077)  SYSTEM ERROR: Attempt to free buffer type 2.

2017 (10.2B0817):

[2017/04/20@11:42:26.408+0600] P-27948      T-2     I ABL  2680: (1077)  SYSTEM ERROR: Attempt to free buffer type 2.

2018 (11.7.2):

[2018/04/20@10:18:13.812+0600] P-4049       T-1     F ABL  3146: (1077)  SYSTEM ERROR: Attempt to free buffer type 2.

This thread is closed