Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Tip for HPC cluster admins
From: John Hearns (hearnsj_at_[hidden])
Date: 2012-10-29 12:23:01

Jeff, this is very good advice.

I have had many, many hours of deep joy getting to know the OOM killer
and all of his wily ways.
Respect the OOM Killer!

On cluster I manage, the OOM killer is working, however there is a
strict policy that if OOM killer kicks on in a cluster node it is
excluded from the batch system and rebooted.
As you say, you can't tell what processes it goes off to kill.

However, there is a very sueful sysctl setting for OOM:

vm.oom_kill_allocating_task Set this to 1 and the system kills the
task which triggered the OOM, rather than doing a scan of system
I find that an an HPC environment this will kill the executable which
is using too much memory.