Dead Man Walking

Posted by George Potemkin on 15-Jul-2016 07:34

A session died 3 days ago after memory violation. It logged out from 3 of 4 databases it was connected to. Right now the process does not exist but one database still has a remote connection opened by the dead session. "Usrtodie" flag (a.k.a. _Connect-Disconnect) is on. Session had an active transaction in this database but transaction was rather small (5266 records were created). A remote client server is still doing approximately 30 db accesses per sec (_UserIO-DbAccess) on behalf of the dead session but the _UserTableStat/_UserIndexStat are not changing.


promon/User Control reports dbkey:

Type      Wait  Table               Dbkey     Trans
REMC/ABL   --     29              3700543  17687823

Dbkey belongs to an index that was updated by transaction (though I don't know if exactly this dbkey was updated by transaction - aimage scan is not yet ready). Dbkey is not locked according to promon/Status: Buffer Lock Queue. I guess it's just a last dbkey the dead session was waiting for.

How can I find what the remote client server is doing right now on behalf of the dead session? Note: it also serves to another 4 sessions.


The customer needs to close the transaction. We can try to stop the server of the dead session but I'm not sure if the server will stop. We can kill the server but it might crash a database. Waiting for db restart is also a bad option.


Suggestions?

All Replies

Posted by Paul Koufalis on 15-Jul-2016 08:00

Is the port still open to the _mprosrv -m1? Maybe something prevented the port from being closed?

I have never tried this myself in prod, but if the port is still open, try and close it. Here is an example on Linux on the sports database. I located the port of the remote client and closed it. In my example, both server and remote client are on localhost but you get the idea:

$ lsof -P -p 2796 | grep TCP

_mprosrv 2796 root   23u  IPv4 192852654      0t0      TCP *:1027 (LISTEN)

_mprosrv 2796 root   25u  IPv4 192852667      0t0      TCP localhost:1027->localhost:56056 (ESTABLISHED)

$ fuser -k -n tcp 56056

56056/tcp:            2794

$ lsof -P -p 2796 | grep TCP

_mprosrv 2796 root   23u  IPv4 192852654      0t0      TCP *:1027 (LISTEN)

In my client _progres I saw a message "Killed" and the process terminated.

Posted by George Potemkin on 15-Jul-2016 08:43

Thanks, Paul! I like your idea and I passed it to the customer.

Also I checked other sessions served by the same server. All of them were connected before the death of the session with active transaction. It looks like they are inactive all this time. So it's indeed OK to close the server's port.

This thread is closed