Salesforce

Single user database telnet session network failure causes multi-user shutdown

« Go Back

Information

 
TitleSingle user database telnet session network failure causes multi-user shutdown
URL NameP128776
Article Number000145201
EnvironmentProduct: Progress OpenEdge
Version: All supported versions
OS: Unix / Linux
Question/Problem Description
A Telnet connection to a UNIX machine accessing a database in single-user mode, lost their telnet session connection due to network failure. When the OS timed out the telnet process, the multi-user session database that was started after deleting the .lk file crashed.

Database does not recover due to timestamp mismatch errors (886), (887), (888).
The before-image file timestamp is that of the single-user session.

PROSTRCT unlock does not report errors and returns "No inconsistencies found in <dbname>. (6952)"
Database fails to start with 886 887 888 after PROSTRCT unlock is run
Database fails to start with 886 887 888 after the Control Area is rebuilt with PROSTRCT builddb
Steps to Reproduce1. prodb test sports2000
2. start a single-user session:
$ pro test -zn -p _dict.p -zp
The id of this process is 24424. (1408)
3. Schema > modified table: Customer > field editor
4. UNPLUG NETWORK CABLE
wait for:
mssg: Network Error: Software caused connection timeout

$ ps -ef | grep 24424
uname 24424 24395 0 18:08 pts/1 /dlc/bin/_progres -1 test -zn -p _dict.p -zp

5. proserve test
BROKER ** The database test is in use in single-user mode. (263)

6. $ mv test.lk REN_ks.lk

7. proserve test
BROKER 0: (6574) Started using pid: 24564

8. mpro do stuff (any transaction processing)
send CTRL+C
run dostuff again

9. wait for TCP/IP KeepAlive to kick in: (or set this to a lower value at the OS level)

20:10:07.261 P-24424 : (562) HANGUP signal received.
20:10:07.261 P-24424 : (2252) Begin transaction backout.
20:10:07.267 P-24424 : (2253) Transaction backout completed.
20:10:07.267 P-24424 : (334) Single-user session end.
20:10:14.674 P-24665 WDOG 6: (4195) test.lk is missing, shutting down...
20:10:14.675 P-24564 BROKER 0: (2249) Begin ABNORMAL shutdown code 2
Clarifying Information
Sequence of Events
  1. Telnet session connected single user to a database
  2. Telnet session terminated by network failure: Network Error: Software caused connection timeout
  3. Database fails to start multi-user due to .lk file information: The database <dbname> is in use in single-user mode. (263)
  4. The database .lk file deleted to start the database
  5. Database starts multi-user without error once the .lk file is deleted
  6. Client sessions connect to multi-user database and execute transaction activity 
  7. TCP/IP KeepAlive timeout kicks in and detects that the single-user pid is no longer active
    HANGUP signal received. (562)
    Begin transaction backout. (2252)
    Transaction backout completed. (2253)
    Single-user session end. (334)
  8. Database shuts down 
    dbname.lk is missing, shutting down (4195)
Error Message** The database was last used <date/time>. (886)
** The before-image file expected <date/time>. (887)
** Those dates don't match, so you have the wrong copy of one of them. (888)
Defect Number
Enhancement Number
Cause
A single user database access was terminated due to network failure, but the user is still connected to the database single user. Since the database.lk file had to be deleted before being able to start the database multi-user, when tcp/ip keepalive kicks in to remove the single user pid from the system, that initial Single user network failure causes multi-user shutdown as it clears it's single-user session deleting the current database lk file.  This leaves the physical database pointers and logical database integrity in question.

While this particular Article outlines an inadvertent single-user session failure, the fact remains that extreme caution needs to be exercised before considering database .lk file deletion whatever process originated the single user access.
Resolution
Future Considerations:

The correct actions should have been to first remove the single-user pid from the system before manually removing the database.lk file. For further information refer to Articles:
Instead of a single user connection to the database, rather start the database up and restrict connections in the connection parameters.
  1. Make a shared-memory database connection from the telnet session.
    start the Database with "-n 1"
  2. Make a client-server database connection from the telnet session.
    Start the Database with "-n 1 -Mn 1 -Ma 1 -S <portnumber>
Preferably connect client-server so that if there is similar network failure, the remote server will then handle the backout/cleanup of the client instead. Moreover, there is the possibility to simply 'proshut' the database to rectify the situation.

To recover:


First and highly recommended option is to restore a database backup which is from a point prior to the single-user session network failure. The Alternate steps below will not necessarily result in a workable database, it all depends on what the originating user session was processing and what the multi-user session was doing subsequently.

Alternatively:

0. Take an OS backup of all the database-related files
1. While the backup is taking place, please refer to Article Consequences of using the -F startup option   
2. Force into the database with: proutil dbname -C truncate bi -F
 
Workaround
Notes
Keyword Phrase
Last Modified Date11/20/2020 7:21 AM

Powered by