Jeff, this is very good advice.
I have had many, many hours of deep joy getting to know the OOM killer
and all of his wily ways.
Respect the OOM Killer!
On cluster I manage, the OOM killer is working, however there is a
strict policy that if OOM killer kicks on in a cluster node it is
excluded from the batch system and rebooted.
As you say, you can't tell what processes it goes off to kill.
However, there is a very sueful sysctl setting for OOM:
vm.oom_kill_allocating_task Set this to 1 and the system kills the
task which triggered the OOM, rather than doing a scan of system
I find that an an HPC environment this will kill the executable which
is using too much memory.