It is advisable to periodically scan the /var/log/messages or /var/log/syslog file to look for OOM Killer messages.
If any reports of OOM killer messages are listed in the messages file, monitoring of the RAM usage by all processes on the system should be performed to ensure that no processes are leaking memory or incorrectly configured to use more memory than needed.
Logs alone will not be able to tell the whole story of how the OOM event occurred, only that it happened and which processes were sacrificed.
Monitoring could be done via a script that runs periodically via a cron job and collects ps, top output, etc. The goal is to capture statistical data before the OOM event, not after. The output should then be analyzed by the system administrator.
Several articles are listed below on searching for potential memory leaks related to ABL or Progress Application Server (PASOE) code usage.
Keep in mind that the total usage by all running processes can also trigger an OOM event in case of increased user activity or the combination of new processes that cause the system to cross the threshold.
If the dstat command is available, the syntax below can be used to determine the top candidates to be killed by OOM killer in case of an Out Of Memory event.
dstat --top-oom
Alternatively, the OOM killer can be disabled in some Linux versions until the cause is determined or further troubleshooting can be performed.
Red Hat Enteprise Linux 4.2 and newer releases have the /proc/sys/vm/oom-kill tunable. Set this to 0 to disable the oom-killer
Red Hat Enteprise Linux 5, 6, 7, 8 and 9 do not have the ability to completely disable OOM-KILLER. Please refer the following solution provided by Redhat for tuning OOM-KILLER operation within RHEL 5, RHEL 6, RHEL 7 and RHEL 8.
https://access.redhat.com/solutions/20985
Telling the OOM killer to ignore a process :
Disabling OOM killer is done on a process by process basis, so you’ll need to know the PID of the running process that you want to protect. This is far from ideal, as process IDs can change frequently, but we can script around it.
As documented by http://linux-mm.org/OOM_Killer: “Any particular process leader may be immunized against the oom killer if the value of its /proc/$pid/oom_adj is set to the constant OOM_DISABLE (currently defined as -17).”
This means we can disable OOM killer on an individual process, if we know its PID, using the command below:
- echo -17 > /proc/$PID/oom_adj
Using pgrep we can run this knowing only the name of the process. For example, let’s ensure that the ssh listener doesn’t get OOM killed:
- pgrep -f "/usr/sbin/sshd" | while read PID; do echo -17 > /proc/$PID/oom_adj; done
Here we used pgrep to search for the full command line (-f) matching “/usr/sbin/sshd” and then echo -17 into the procfs entry for each matching pid.
In order to automate this, you could run a cron regularly to update the oom_adj entry. This is a simple way to ensure that sshd is excluded from OOM killer after restarting the daemon or the server.
- */1 * * * * root pgrep -f "/usr/sbin/sshd" | while read PID; do echo -17 > /proc/$PID/oom_adj; done
The above job will run every minute, updating the oom_adj of the current process matching /usr/sbin/sshd. Of course this could be extended to include any other processes you wish to exclude from OOM killer.