Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] openmpi-1.7.5a1r30692 and slurm problems
From: Adrian Reber (adrian_at_[hidden])
Date: 2014-02-12 07:42:48


I tried the nightly snapshot (openmpi-1.7.5a1r30692.tar.gz) on a system
with slurm and moab. I requested an interactive session using:

msub -I -l nodes=3:ppn=8

and started a simple test case which fails:

$ mpirun -np 2 ./mpi-test 1
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
  ./mpi-test

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
srun: error: xxxx108: task 1: Exited with exit code 1
srun: Terminating job step 131823.4
srun: error: xxxx107: task 0: Exited with exit code 1
srun: Job step aborted
slurmd[xxxx108]: *** STEP 131823.4 KILLED AT 2014-02-12T13:30:32 WITH SIGNAL 9 ***

requesting only one core works:

$ mpirun ./mpi-test 1
4.4.7 20120313 (Red Hat 4.4.7-4):Process 0 on xxxx106 out of 1: 0.000000
4.4.7 20120313 (Red Hat 4.4.7-4):Process 0 on xxxx106 out of 1: 0.000000

using openmpi-1.6.5 works with multiple cores:

$ mpirun -np 24 ./mpi-test 2
4.4.7 20120313 (Red Hat 4.4.7-4):Process 0 on xxxx106 out of 24: 0.000000
4.4.7 20120313 (Red Hat 4.4.7-4):Process 12 on xxxx106 out of 24: 12.000000
4.4.7 20120313 (Red Hat 4.4.7-4):Process 11 on xxxx108 out of 24: 11.000000
4.4.7 20120313 (Red Hat 4.4.7-4):Process 18 on xxxx106 out of 24: 18.000000

$ echo $SLURM_JOB_CPUS_PER_NODE
8(x3)

I never used slurm before so this could also be a user error on my side.
But as 1.6.5 works it seems something has changed and wanted to let
you know in case it was not intentionally.

                Adrian