This morning we had a significant issue where a number of large data consumers locked a bunch of records and temporarily reached the lock table limit of the server. At the time we got some ABL client failures ("Lock table overflow, increase -L on server").
However, even after the original locks were all eventually released, we continued to see that error. For example, app servers agents kept showing the message and preventing users from using any of the appserver agents (both state-reset and statefree were affected). I found this KB (P12614) and assumed it was an unresolved bug in appserver where the agents become permanently corrupted .
So to fix appservers, we trimmed out all the agents. That finally stopped the bogus "Lock table overflow" messages for appserver clients.
However it is now late in the afternoon and I just went to look at the database log one last time. I am shocked to find that we are *still* getting bogus/illegitimate error messages saying "Lock table overflow". It is not possible that this error is still in effect and I've confirmed it multiple ways. Below is the output we see in the database log.
[2017/01/03@15:58:17.747-0500] P-4763 T-56 I SRV 3: (915) Lock table overflow, increase -L on server
These ongoing error messages are still being shown (in the database log) and are apparently coming from _sqlsrv2 . While the errors are being shown in the database log file, there is apparently no impact on any remote SQL92 clients. I think there must have been a flag that was raised to send this message to the database log, and the flag hasn't been lowered.
Has anyone else ever seen bogus "Lock table overflow" messages for _sqlsrv2? Is there an easy way to fix without bouncing the whole database?
Can you terminate the _sqlsrv2 server(s)?
promon - R&D - Admin Functions - Server Options - Terminate a server.
> I think there must have been a flag that was raised to send this message to the database log, and the flag hasn't been lowered.
The error is rised when free chain in Lock Table is empty.
> It is not possible that this error is still in effect and I've confirmed it multiple ways.
Do the sessions connect more than one database?
OK, that R&D function worked. Thanks for the tip!
It was our first time to ever do that (normally we'd restart the whole database).
I wasn't able to find any other references to the "sticky" error message ("Lock table overflow, increase -L on server") which won't go away (even after the locks are released). Has anyone else encountered it? I suppose if it happened to us once, then it may be consistent and I may be able to repro and send it to Progress support. Although I'm not sure how likely it would be for this to make it into a service pack (for 11.3).
> The error is raised when free chain in Lock Table is empty.
_sqlsrv2 should not have a separate lock table. So why would these processes be the only ones generating the error messages in the database log? The locks in use were only at about 200 (out of 500,000) so I'm quite certain that the message was bogus (maybe a cached counter or something like that).
I did learn that the strategy for handling lock information changed a bit (in 11.4 ) but we are still on 11.3.
> why would these processes be the only ones generating the error
SQL queries tend to easily use many more locks than 4GL queries do.
> The locks in use were only at about 200 (out of 500,000)
Do you have the statistics of lock requests per sample interval? How big the difference in the lock requests between the intervals? I'm next to Gus: a single bad query can easily overflow the lock table.
How often you get "Lock table overflow" error in db log? How often SQL clients are running any specific tasks/queries against the large tables (where the number of records is greater than 500,000)?
> I'm quite certain that the message was bogus (maybe a cached counter or something like that).
I'm pretty sure it's not the case.
I'm quite certain the message was bogus. The error message "stuck" in appserver agents and didn't go away until we trimmed them. It "stuck" in _sqlsrv2 and didn't go away until we trimmed the sql92 broker (promon - R&D - Admin Functions - Server Options - Terminate a server).
The lock table itself did not show that any significant number of locks were *actually* in use (at least not after the initial / brief incident ended).
Obviously we can get some harder data, and even a repro if necessary. But we run 11.3 which is supposedly "retired" and I'm not sure how much value there would be to do a lot more investigation if Progress has already retired my version of OE. Would a bug in this area that applies to 11.3 also apply to 11.6 or can we reasonably hope it will go away? I noticed KB's that say some lock behavior was changed after 11.3 (knowledgebase.progress.com/.../P161995 )
"Lock table overflow" message can be bogus only if promsgs file is corrupted. I don't imagine a "theory" that would explain the error by corruption in lock table. I'm sure that Progress stores the number of lock entries on free chain in shared memory. But it's the same counter for all client's types. Also Progress adds one entry after each real "Lock table overflow" error. You can easily count the number of the errors: take the difference between current size of lock table and its initial value (that equal to -L ‘rounded’ to a multiple of 32). You can compare it with the number of errors in db log. You can monitor _DbStatus-NumLocks: If the current value is greater than the previous one and close (let's say 90%) to _Startup-LockTable than dump the whole _Lock table to find a process that overflows the lock table.
> I noticed KB's that say some lock behavior was changed after 11.3
No, the population of _Lock VST was changed. Algorithms used by lock table itself did not changed.
Thanks for the tips. We have trouble with suggestion about dumping the _lock table, because that process seems pretty slow (at least on HP-UX). But I'm trying to make arguments to upgrade from 11.3 to 11.6 and this may be one advantage to making the upgrade.