Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] slurm and all-srun orterun
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-03-03 20:18:42


Hello

I don't monitor the user list any more, but a friendly elf sent this along
to me.

I'm not entirely sure what problem might be causing the behavior you are
seeing. Neither mpirun nor any orted should be impacted by IB problems as
they aren't MPI processes and thus never interact with IB. Only application
procs touch the IB subsystem - if an application proc fails, the orted
should see that and correctly order the shutdown of the job. So if you are
having IB problems, that wouldn't explain daemons failing.

If a daemon is aborting, that will cause problems in 1.2.x. We have noted
that SLURM (even though the daemons are launched via srun) doesn't always
tell us when this happens, leaving Open MPI vulnerable to "hangs" as it
attempts to cleanup and finds it can't do it. I'm not sure why you would see
a daemon die, though - the fact that an application process failed shouldn't
cause that to happen. Likewise, it would seem strange that the application
process would fail and the daemon not notice - this has nothing to do with
slurm, but is just a standard Linux "waitpid" method.

The most likely reason for the behavior you describe is that an application
process encounters an IB problem which blocks communication - but the
process doesn't actually abort or terminate, it just hangs there. In this
case, the orted doesn't see the process exit, so the system doesn't know it
should take any action.

That said, we know that 1.2.x has problems with clean shutdown in abnormal
situations. Release 1.3 (when it comes out) addresses these issues and
appears (from our testing, at least) to be much more reliable about cleanup.
You should see a definite improvement in the detection of process failures
and subsequent cleanup.

As for your question, I am working as we speak on two new launch modes for
Open MPI:

1. "direct" - this uses mpirun to directly launch the application processes
without use of the intermediate daemons.

2. "standalone" - this uses the native launch command to simply launch the
application processes, without use of mpirun or the intermediate daemons.

The initial target environments for these capabilities are TM and SLURM. The
latter poses a bit of a challenge as we cannot use their API due to
licensing issues, so it will come a little later. We have a design for
getting around the problem - the ordering is more driven by priorities then
anything technical.

The direct launch capability -may- be included in 1.3 assuming it can be
completed in time for the release. If not, it will almost certainly be in
1.3.1. I'm expecting to complete the TM version in the next few days, and
perhaps get the SLURM version working sometime this month - but they will
need validation before being included in an official release.

I can keep you posted if you like - once this gets into our repository, you
are certainly welcome to try it out. I would welcome feedback on it.

Hope that helps
Ralph

>> From: "Sacerdoti, Federico" <Federico.Sacerdoti_at_[hidden]>
>> Date: March 3, 2008 12:44:39 PM EST
>> To: "Open MPI Users" <users_at_[hidden]>
>> Subject: [OMPI users] slurm and all-srun orterun
>> Reply-To: Open MPI Users <users_at_[hidden]>
>>
>> Hi,
>>
>> We are migrating to openmpi on our large (~1000 node) cluster, and
>> plan
>> to use it exclusively on a multi-thousand core infiniband cluster in
>> the
>> near future. We had extensive problems with parallel processes not
>> dying
>> after a job crash, which was largely solved by switching to the slurm
>> resource manager.
>>
>> While orterun supports slurm, it only uses the srun facility to launch
>> the "orted" daemons, which then start the actual user process
>> themselves. In our recent migration to openmpi, we have noticed
>> occasions where orted did not correctly clean up after a parallel job
>> crash. In most cases the crash was due to an infiniband error. Most
>> worryingly slurm was not able to cleanup the orted, and it along with
>> user processes were left running.
>>
>> At SC07 I was told that there is some talk of using srun to launch
>> both
>> orted and user processes, or alternatively use srun only. Either would
>> solve the cleanup problem, in our experience. Is Rolf Castain on this
>> list?
>>
>> Thanks,
>> Federico
>>
>> P.S.
>> We use proctrack/linuxproc slurm process tracking plugin. As noted in
>> the config man page, this may fail to find certain processes and
>> explain
>> why slurm could not clean up orted effectively.
>>
>> man slurm.conf(5), version 1.2.22:
>> NOTE: "proctrack/linuxproc" and "proctrack/pgid" can fail to identify
>> all processes associated with a job since processes can become a child
>> of the init process (when the parent process terminates) or change
>> their
>> process group. To reliably track all processes, one of the other
>> mechanisms utilizing kernel modifications is preferable.
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>