Information

Title	Investigating why the database takes so long to shut down

URL Name	P55254

Article Number	000139300

Environment	Produce: OpenEdge Version: All supported versions OS: All supported platforms Other: PROSHUT

Question/Problem Description

Investigating why a database takes so long to shut down
Understanding if proshut hangs database shutdown
Troubleshooting database shutdown times

Steps to Reproduce

Clarifying Information

Error Message

Defect Number

Enhancement Number

Cause

Resolution

What happens when the database is shutdown?

During the shutdown process, a complete flush of the dirty buffers takes place and all connected processes are given time to end before the connection is removed from shared memory.
Should the database need to be more aggressively shut down, issue an emergency shutdown after the initial graceful shutdown. During an emergency shutdown, active connections are all marked 'usrtodie' and will be forcefully removed from shared memory. The shutdown will no longer have to wait for these to exit themselves.

$ proshut [dbname] -by -shutdownTimeout immed
$ proshut [dbname] -F -by

It is important to clarify the database manager never "logs out' users, only user sessions can "Logout". It disconnects connected users from shared memory after cleaning anything related to their connected context at the time. This is why the "Logout" message in the database lg file is posted before the user control is then removed from shared memory. Additionally, an interactive client's session itself will therefore still be open but no longer connected. For further detail on the differences between a normal and an emergency shutdown, refer to Article:

What happens during PROSHUT -by and PROSHUT -F?

Since OpenEdge 10.2B shutdown improvements introduced the -shutdownTimeout parameter, to limit the time period the PROSHUT process will wait for the database activity to end.

One of the reasons this parameter was introduced is related to additional changes to the shutdown routine, where the database manager tries to assure closed transactions before shutdown. This improves consequent database startup times as it is always 'faster' to process from buffers in shared memory than have to read from disk.
If all normal shutdown activity ceases before the timeout value (default 10 minutes), the database is closed.
When activity has not ended by the time the shutdown timeout is reached, the database Broker suspends all further writes to the database and terminates remaining connected processes if they do not disconnect or terminate on their own to complete shutdown.
For further information, refer to Article: Is there a parameter for PROSHUT that will allow a DBA to control how long it takes to shut a database down normally?

Investigating why the database takes so long to shut down

If this scenario is happening often, consider the following the next time a shutdown is needed:

1. Gather the following information when shutdown is hanging, before finally killing the _mproshut shutdown or the _mprosrv Broker

a) Ideally, users should first be asked to log off before issuing a shutdown. Alternatively, the following Article describes a programmatic method of disconnecting all currently connected users before requesting to shutdown the database:

How to disconnect all self and remote users from a database?

b) Check the system for any rogue processes that haven't logged off and produce a stacktrace against them.

This helps to isolate if a process may lock the shutdown.
Since 10.1C these can be generated with the proGetStack command: What is proGetStack?

c) If After-imaging is in use; get the current status of all ai extents

$ rfutil dbname -C aimage list > ailist.out

d) Run a PROMON session before PROSHUT, to gather information on current user activity.

In current releases (11.7, 12) a promon gather script is provided in the OpenEdge install directory:

Run it without the perf option under these conditions. This will result in the script sending signals to all OpenEdge executables on the machine to dump protrace files for analysis. It is advisable to carefully read the instructions in the Articles below to familiarize the requirements for using these scripts.

e) Having gathered the above information, shutdown the database

$ proshut [dbname] -by -shutdownTimeout 5m

A process trace can also be useful in this situation:

Using TRUSS on AIX/Solaris or ltrace / strace on Linux:

$ truss -aef -o /tmp/shutdown.out $DLC/bin/_mprshut [ dbname] -by -shutdownTimeout 5m
$ strace -f -o /tmp/shutdown.out $DLC/bin/_mprshut [ dbname] -by -shutdownTimeout 5m

On Windows, The Microsoft Process Monitor (procmon) tool can be started to trace shutdown activity: https://docs.microsoft.com/en-us/sysinternals/downloads/procmon

If database shutdown does not hang:

Information gathered previously may have already highlighted the reason shutdown takes longer than expected or issues with specific processes.

If the shutdown is hanging, produce a stacktrace against the _mprshut process itself:

UNIX: kill -16 <process-id>
Windows: The Microsoft Process Explorer tool (procexp) can be used: https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer

Then issue an emergency shutdown. If necessary generate a stacktrace against the _mprshut process again and check any remaining processes still connected:

$ proshut [dbname] -F -by

Finally terminate process PID's with OS utilities, taking special note of which processes still had to be removed by this last-resort method.

2. Once the database has successfully completed shutdown:

Don't start the database multi-user, truncate the bi file to assure crash recovery completes without any error messages:

Errors could indicate file corruption which would need to be investigated at a System level.

$ proutil [ dbname ] -C truncate bi -crStatus 10 -crTXDisplay

Additional Suggestions to investigate long or hanging database shutdown

0. Upgrade OpenEdge to save time by ruling out known issues:

1. Why did the database need to be shutdown?

Is the shutdown part of site specific operational procedures, in which case take note of what prescribed actions follow shutting down the OpenEdge database. Often these start preemptively assuming the database has completed closing before it has.
Were there preceding issues either with the database or at the system level or other applications running on this server that required the OpenEdge environment to be shutdown?

2. Take note of differences between who or how the shutdown action is being performed:
This can highlight a permissions issue, the DLC environment variable not being set correctly which leads to a different version of _mproshut executable being used than that which the database was started with.

From the same_user_account / environment every time?
Differences between task scheduler / cron job scripts vs interactive shutdown?

3. Shutdown process times are affected by shared-memory (-B, -B1, -shmsegsize) database, bi and ai block sizes.

a. When bi and ai block sizes differ, before and after transaction notes related to active processes exiting need to be written (usually having to write UNDO transaction notes for active transactions)
b. When shared memory buffers are not tuned properly:

For small sized databases with a small buffer pool, shutdown time is delayed when there are many users still connected or even a few users still running intensive transaction processing. Try increasing the -B to at least 32000 and then tune this until ‘Buffer Hits’ in PROMON, 5. Activity, equate to > 85% or paging, under ‘typical' database load.
For larger databases with -B values larger than realistically needed, the shutdown process is hindered flushing buffers to disk. Further investigations into improving i/o on the system may be required.

4. During shutdown:

Are there scheduled tasks / cron jobs timed interval means they try to run when the database is being shutdown?
How long is the Buffer Pool flushing allowed to complete? Investigating why Flushing Buffer Pool: buffers remaining during database shutdown
What is the disk wait activity like?
What other activity is running on the system which is perhaps causing system resource contention or blocking shutdown. For example a Windows update is known to cause shutdown to hang: DB Hung and _mprshut crashed when database was stopped
Are only Progress processes hanging or are other processes hanging as well? If these cannot be explained, consider getting the system checked out by the Operating System vendor.

5. In the database log file, note any additional messages recorded prior and during the shutdown

++ (298) KILL signal received.
Scripts that (periodically) terminate processes connected to shared-memory with taskkill / kill -9. This is known to cause shutdown to hang where protrace information will show client processes in a deadlock waiting for a latch.

Database hangs after proshut command when client processes are killed.

++ bkread: missing bkflsx call (611)
This indicates a contention situation for buffers in the -B buffer pool: usually because there are not enough to go around. Increasing the -B parameter will improve matters provided sufficient RAM (and associated swap / pagefile space) are available. Whenever the force option is needed, (proshut dbname -F), this indicates a bigger problem. Apart from the system and peripherals the database runs on, a full database integrity scan followed by database repair actions if there are data corruption in the report.

What tools / utilities can I run to check the integrity of a database or check for corruption?

++ 1124 errors

When block corruption messages are issued by the database “BROKER”, except during an Abnormal shutdown, this indicates a defect because Brokers do not read blocks during shutdown. It flushes the buffer pool to disk during shutdown.
When 1124 block corruption messages are evident and subsequent investigation finds no block corruption on disk, block bkioread / bkiowrite errors are due to memory, or cache issues at the time. These can also be related to any non-Progress utilities accessing the runtime application environment at the time outside of database quiescense (PROQUIET).

++ SYSTEM ERROR: Unable to kill parent process, errno= . (1680)

Review the database -S, -minport, -maxport parameters. Progress recommends not using ports < 5000 as these are typically reserved for other processes. Changing remote server ports to any unused range is one known fix that will circumvent system errors concerning being unable to kill parent processes.

All of the above could be direct or indirect contributing factors to the delaying of the completion of the database shutdown.

Workaround

Notes

Progress Articles:

Database is not shutting down after issuing an unconditional shutdown request
What is UNIX truss command?
Troubleshooting why PROSHUT does not remove the userid from the PROMON User Control Table
Clients connected to dead remote server process cannot be disconnected
Client process (batch or interactive) crashing after being blocked on record lock when dbnotification is enabled

Keyword Phrase

Last Modified Date	11/9/2021 1:26 PM