Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Can not submit openmpi jobs with slurm on Centos 6.0
From: USA Linux UAE (usasoftwareengineer_at_[hidden])
Date: 2012-10-10 13:44:52


I am using openmpi (1.4.3) with slurm (2.4.2) on Centos 6.0

I can execute my jobs with mpirun to my nodelist in partition using "-H"
option with mpirun.

But when i use slurm and use

salloc -n 3 sh

and then submit mpi jobs using mpirun <mpibinary>

I get the following error:

salloc: Granted job allocation 289
sh-4.1$ mpirun mpihello
[v2:29784] [[57331,0],0] ORTE_ERROR_LOG: Not found in file
plm_slurm_module.c at line 350
A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
mpirun: clean termination accomplished

Any debugging procedure with openmpi and slurm?