The KeepAlive mechanism does not disconnect idle TCP/IP connections:When there is an established socket connection, and the connection is idle, no packets are transmitted. There is therefore no way to tell if the connection is still valid without sending some data and seeing if an error is returned.
Firewalls on the otherhand have a timeout on inactive connections feature. For example, after some idle time, when the firewall times out the socket connection, a WebSpeed Agent is unaware that its connection to the database(s) has been broken until the next request occurs. That communication attempt then fails with errors 778 and 735 that are reported when it is informed (by the TCP/IP stack, through the KeepAlive mechanism) that the connection has been dropped.
KeepAlive detects situations where one side of the connection is no longer listening:The KeepAlive mechanism does this by sending low-level probe messages to see if the other side responds. If it does not respond to a certain number of probes within a certain amount of time, then it assumes the connection is dead and the process using the socket will then detect this through an error indication.
The system-wide timeout parameter that controls how long a connection has to be idle before it starts probing and how often probes are sent, is TCP_KEEPALIVE. The default value of the idle time before probes begin is two hours (7,200,000 ms). It can probably be lowered to 5 minutes without too many unwanted side effects, but be aware that it affects the whole system.
For example:
- The Database Broker will write an error message to the database log when it is informed (by the TCP/IP stack) that the connection has been dropped.
- In the OpenEdge Replication view, this is in essence when the error message is written to the database log when the RPLS or RPLA is informed (by the TCP/IP stack) that the connection has been dropped then based on the configured parameters, tries to re-connect. It is also why the repl-keep-alive feature was introduced as a logical/application implementation when the Replication Server or Agent blocks when trying to send the message and the failure is not recognized. This is discussed further in Article How long does RPLS take to detect network outage to the RPLA?