Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-06-21 08:00:36


Ick; I'm surprised that we don't have this info on the FAQ. I'll try
to rectify that shortly.

How are you launching your jobs through SLURM? OMPI currently does
not support the "srun -n X my_mpi_application" model for launching
MPI jobs. You must either use the -A option to srun (i.e., get an
interactive SLURM allocation) or use the -b option (submit a script
that runs on the first node in the allocation). Your script can be
quite short:

#!/bin/sh
mpirun my_mpi_application

Note that OMPI will automatically figure out how many cpu's are in
your SLURM allocation, so you don't need to specify "-np X". Hence,
you can run the same script without modification no matter how many
cpus/nodes you get from SLURM.

It's on the long-term plan to get "srun -n X my_mpi_application"
model to work; it just hasn't bubbled up high enough in the priority
stack yet... :-\

On Jun 20, 2007, at 1:59 PM, Jeff Pummill wrote:

> Just started working with OpenMPI / SLURM combo this morning. I can
> successfully launch this job from the command line and it runs to
> completion, but when launching from SLURM they hang.
>
> They appear to just sit with no load apparent on the compute nodes
> even though SLURM indicates they are running...
>
> [jpummil_at_trillion ~]$ sinfo -l
> Wed Jun 20 12:32:29 2007
> PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT SHARE GROUPS
> NODES STATE NODELIST
> debug* up infinite 1-infinite no no all
> 8 allocated compute-1-[1-8]
> debug* up infinite 1-infinite no no all
> 1 idle compute-1-0
>
> [jpummil_at_trillion ~]$ squeue -l
> Wed Jun 20 12:32:20 2007
> JOBID PARTITION NAME USER STATE TIME TIMELIMIT
> NODES NODELIST(REASON)
> 79 debug mpirun jpummil RUNNING 5:27
> UNLIMITED 2 compute-1-[1-2]
> 78 debug mpirun jpummil RUNNING 5:58
> UNLIMITED 2 compute-1-[3-4]
> 77 debug mpirun jpummil RUNNING 7:00
> UNLIMITED 2 compute-1-[5-6]
> 74 debug mpirun jpummil RUNNING 11:39
> UNLIMITED 2 compute-1-[7-8]
>
> Are there any known issues of this nature involving OpenMPI and SLURM?
>
> Thanks!
>
> Jeff F. Pummill
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems