Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Pummill (jpummil_at_[hidden])
Date: 2007-06-27 14:21:02


Hey Jeff,

Finally got my test nodes back and was looking at the info you sent. On
the SLURM page, it states the following:

*Open MPI* <http://www.open-mpi.org/> relies upon SLURM to allocate
resources for the job and then mpirun to initiate the tasks. When using
salloc command, mpirun's -nolocal option is recommended. For example:

$ salloc -n4 sh # allocates 4 processors and spawns shell for job
> mpirun -np 4 -nolocal a.out
> exit # exits shell spawned by initial salloc command

You are saying that I need to use the slurm salloc, then pass SLURM a
script? Or could I just add it all into the script? Fro eaample:

#!/bin/sh
salloc -n4
mpirun my_mpi_application

Then, run with srun -b myscript.sh

Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas
Fayetteville, Arkansas 72701
(479) 575 - 4590
http://hpc.uark.edu

"A supercomputer is a device for turning compute-bound
problems into I/O-bound problems." -Seymour Cray

Jeff Squyres wrote:
> Ick; I'm surprised that we don't have this info on the FAQ. I'll try
> to rectify that shortly.
>
> How are you launching your jobs through SLURM? OMPI currently does
> not support the "srun -n X my_mpi_application" model for launching
> MPI jobs. You must either use the -A option to srun (i.e., get an
> interactive SLURM allocation) or use the -b option (submit a script
> that runs on the first node in the allocation). Your script can be
> quite short:
>
> #!/bin/sh
> mpirun my_mpi_application
>
> Note that OMPI will automatically figure out how many cpu's are in
> your SLURM allocation, so you don't need to specify "-np X". Hence,
> you can run the same script without modification no matter how many
> cpus/nodes you get from SLURM.
>
> It's on the long-term plan to get "srun -n X my_mpi_application"
> model to work; it just hasn't bubbled up high enough in the priority
> stack yet... :-\
>
>
> On Jun 20, 2007, at 1:59 PM, Jeff Pummill wrote:
>
>
>> Just started working with OpenMPI / SLURM combo this morning. I can
>> successfully launch this job from the command line and it runs to
>> completion, but when launching from SLURM they hang.
>>
>> They appear to just sit with no load apparent on the compute nodes
>> even though SLURM indicates they are running...
>>
>> [jpummil_at_trillion ~]$ sinfo -l
>> Wed Jun 20 12:32:29 2007
>> PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT SHARE GROUPS
>> NODES STATE NODELIST
>> debug* up infinite 1-infinite no no all
>> 8 allocated compute-1-[1-8]
>> debug* up infinite 1-infinite no no all
>> 1 idle compute-1-0
>>
>> [jpummil_at_trillion ~]$ squeue -l
>> Wed Jun 20 12:32:20 2007
>> JOBID PARTITION NAME USER STATE TIME TIMELIMIT
>> NODES NODELIST(REASON)
>> 79 debug mpirun jpummil RUNNING 5:27
>> UNLIMITED 2 compute-1-[1-2]
>> 78 debug mpirun jpummil RUNNING 5:58
>> UNLIMITED 2 compute-1-[3-4]
>> 77 debug mpirun jpummil RUNNING 7:00
>> UNLIMITED 2 compute-1-[5-6]
>> 74 debug mpirun jpummil RUNNING 11:39
>> UNLIMITED 2 compute-1-[7-8]
>>
>> Are there any known issues of this nature involving OpenMPI and SLURM?
>>
>> Thanks!
>>
>> Jeff F. Pummill
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>