Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] openmpi-1.7.5a1r30692 and slurm problems
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-02-12 09:12:45


Seems rather odd - since this is managed by Moab, you shouldn't be seeing SLURM envars at all. What you should see are PBS_* envars, including a PBS_NODEFILE that actually contains the allocation.

On Feb 12, 2014, at 4:42 AM, Adrian Reber <adrian_at_[hidden]> wrote:

> I tried the nightly snapshot (openmpi-1.7.5a1r30692.tar.gz) on a system
> with slurm and moab. I requested an interactive session using:
>
> msub -I -l nodes=3:ppn=8
>
> and started a simple test case which fails:
>
> $ mpirun -np 2 ./mpi-test 1
> --------------------------------------------------------------------------
> There are not enough slots available in the system to satisfy the 2 slots
> that were requested by the application:
> ./mpi-test
>
> Either request fewer slots for your application, or make more slots available
> for use.
> --------------------------------------------------------------------------
> srun: error: xxxx108: task 1: Exited with exit code 1
> srun: Terminating job step 131823.4
> srun: error: xxxx107: task 0: Exited with exit code 1
> srun: Job step aborted
> slurmd[xxxx108]: *** STEP 131823.4 KILLED AT 2014-02-12T13:30:32 WITH SIGNAL 9 ***
>
>
> requesting only one core works:
>
> $ mpirun ./mpi-test 1
> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 0 on xxxx106 out of 1: 0.000000
> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 0 on xxxx106 out of 1: 0.000000
>
>
> using openmpi-1.6.5 works with multiple cores:
>
> $ mpirun -np 24 ./mpi-test 2
> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 0 on xxxx106 out of 24: 0.000000
> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 12 on xxxx106 out of 24: 12.000000
> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 11 on xxxx108 out of 24: 11.000000
> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 18 on xxxx106 out of 24: 18.000000
>
> $ echo $SLURM_JOB_CPUS_PER_NODE
> 8(x3)
>
> I never used slurm before so this could also be a user error on my side.
> But as 1.6.5 works it seems something has changed and wanted to let
> you know in case it was not intentionally.
>
> Adrian
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel