Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] some mpi processes "disappear" on a cluster of servers
From: John Hearns (hearnsj_at_[hidden])
Date: 2012-09-01 03:48:56

Apologies, I have not taken the time to read your comprehensive diagnostics!

As Gus says, this sounds like a memory problem.
My suspicion would be the kernel Out Of Memory (OOM) killer.
Log into those nodes (or ask your systems manager to do this). Look
closely at /var/log/messages where there will be notifications when
the OOM Killer kicks in and - well - kills large memory processes!
Grep for "killed process" in /var/log/messages