Information

Title	Events that could cause a process to terminate abnormally

URL Name	P23047

Article Number	000146880

Environment	Product: Progress Product: OpenEdge Version: All supported versions OS: All supported platforms

Question/Problem Description

What causes an Abnormal shutdown of the database?
What triggers Begin ABNORMAL shutdown code x (2249)
Why does the database Broker or WDOG Disconnect dead user (2527)
How can users die with shared memory locks 2522 to trigger abnormal shutdown
Why do users die with buffers locked? (2523) (5027) (5028)
Database crashes when a user died with buffer lock 2523 5027 5028
Events that could cause a process to terminate abnormally

Steps to Reproduce

Clarifying Information

Error Message	Begin ABNORMAL shutdown code (2249) Disconnecting dead user <number>. (2527) User <num> died holding <num> shared memory locks. (2522) User <num> died with <num> buffers locked. (2523) User <num> died with <num> buffers locked. (5027) SYSTEM ERROR: Releasing regular latch. latchId:<latch-num> (5028)

Defect Number

Enhancement Number

Cause

Resolution

When a user connects to a Progress database in multi-user mode, the user control table in shared memory is updated to add the user. The watchdog process periodically checks to make sure that all users listed in the shared memory user control table have corresponding OS processes running. When the WatchDog detects that a user process is listed in the Progress user control table but does not have an active OS process, the shared memory user/remote server will be disconnected from the Progress database and the following message examples will be written to the log file:

BROKER 0: Disconnecting dead user 41 (2527)
BROKER 0: BROKER detects death of server 6464 (1153)
WDOG 12: Disconnecting client 55 of dead server 1 (2526)

However, if the watchdog process detects that shared memory or buffer latches have been left in an inconsistent state by this "dead process", by design the Broker will then shut the database down to protect the integrity of the database. This abnormal termination of the process that still holds latches (locks) in shared memory, will appear prior to the 2249 message:

SYSTEM ERROR: User 41 died during microtransaction. (2256)
SYSTEM ERROR: Releasing regular latch. latchId:<latch-num> (5028)
User 41 died holding <num> shared memory locks. (2522)
User 41 died with <num> buffers locked. (2523)
User 41 died with <num> buffers locked. (5027)

The database manager will initialise an emergency shutdown with the following message written to the log file, followed by all users being logged out and the server shutting down:

BROKER 0: Begin ABNORMAL shutdown code 2 (2249)

The code number in the (2249) message above is for future use and currently has no meaning. This error in itself does not indicate database corruption, merely that measures have been taken by the broker to prevent potential corruption. When re-starting the database, crash recovery takes place and the database will be ready for normal use if no errors are encountered during bi crash recovery. Crash recovery may take some time, especially if during the Physical and Logical undo phases there are uncommitted transactions to roll back. Refer to Article

How to estimate how long BI Recovery will take to complete

In addition to the above errors, some of the following messages may be written to the .lg file by the WDOG, if enabled or the Primary login BROKER if not, as the abnormal shutdown proceeds and logs out all the users:

SYSTEM ERROR: redundant lwake user <n> latch <x>
SYSTEM ERROR: bkrlsbuf: cannot release buffer lock, use count 0 is invalid. (1051)

These errors in themselves do not indicate database corruption.

Events that could cause a process to terminate abnormally include:

Sending a kill signal (other than SIGHUP, like SIGTERM or SIGKILL) to the process. Refer to Article Guidelines on the use of UNIX kill command to stop a process
Shutting off a terminal while in an active Progress session and the terminal (tty or PC with terminal emulation) does not send a SIGHUP Refer to Article What does HANGUP signal received (562) mean?
Exercise caution using SIGUSR1 to produce process C-stack information in OpenEdge versions that have not had async unsafe signal handling addressed. Refer to Article Can KILL -SIGUSR1 cause a process to crash?
Operating system killing processes due to memory shortage. For example refer to Article Would consuming all memory bring other databases down?
Anti-Virus software running against the database
Port Scanning running against the database environment
Non-Progress utilities such as Volume Shadow Copy or Volume Snapshot Service touching the database files while it is running
The process aborting as a result of a System Error (reported in the log file as error 49 for example). Refer to Article Memory violation (49), what it means and how to troubleshoot

For those sites where this occurs frequently, one way to prevent an abnormal shutdown is to run all users as "remote" clients (i.e. not shared memory, but TCP/IP) even if they are logging in directly from the machine where the database is. In this way, the user process does not access shared memory directly. They connect to a database Remote Server process which accesses shared memory instead. The Server process obtains shared memory and buffer latches for the user process and when the user process abnormally terminates, the Server process usually remains running to clean up remaining latches held for that user's session.

When users dying holding shared memory or buffer latch (lock) occurs repeatedly, it is important to determine the root cause of the user process dying and to resolve that problem.

Workaround

Notes

Progress Articles:

Why a Database shuts down with the message User died holding shared memory lock?
How the Database Watchdog Works?
Microtransactions and error 2256
What causes DBDOWN errors 3719 5029 or 3718 5028

Keyword Phrase

Last Modified Date	2/9/2022 8:37 PM