Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Pummill (jpummil_at_[hidden])
Date: 2007-06-27 15:32:29


Thanks for the info Tim. That worked perfectly.

And I now have the OpenMPI FAQ page bookmarked ;-)

Jeff F. Pummill

Tim Prins wrote:
> Hi Jeff,
>
> If you submit a batch script, there is no need to do a salloc.
>
> See the Open MPI FAQ for details on how to run on SLURM:
> http://www.open-mpi.org/faq/?category=slurm
>
> Hope this helps.
>
> Tim
>
> On Wednesday 27 June 2007 14:21, Jeff Pummill wrote:
>
>> Hey Jeff,
>>
>> Finally got my test nodes back and was looking at the info you sent. On
>> the SLURM page, it states the following:
>>
>> *Open MPI* <http://www.open-mpi.org/> relies upon SLURM to allocate
>> resources for the job and then mpirun to initiate the tasks. When using
>> salloc command, mpirun's -nolocal option is recommended. For example:
>>
>> $ salloc -n4 sh # allocates 4 processors and spawns shell for job
>>
>>
>>> mpirun -np 4 -nolocal a.out
>>> exit # exits shell spawned by initial salloc command
>>>
>> You are saying that I need to use the slurm salloc, then pass SLURM a
>> script? Or could I just add it all into the script? Fro eaample:
>>
>> #!/bin/sh
>> salloc -n4
>> mpirun my_mpi_application
>>
>> Then, run with srun -b myscript.sh
>>
>>
>> Jeff F. Pummill
>> Senior Linux Cluster Administrator
>> University of Arkansas
>> Fayetteville, Arkansas 72701
>> (479) 575 - 4590
>> http://hpc.uark.edu
>>
>> "A supercomputer is a device for turning compute-bound
>> problems into I/O-bound problems." -Seymour Cray
>>
>> Jeff Squyres wrote:
>>
>>> Ick; I'm surprised that we don't have this info on the FAQ. I'll try
>>> to rectify that shortly.
>>>
>>> How are you launching your jobs through SLURM? OMPI currently does
>>> not support the "srun -n X my_mpi_application" model for launching
>>> MPI jobs. You must either use the -A option to srun (i.e., get an
>>> interactive SLURM allocation) or use the -b option (submit a script
>>> that runs on the first node in the allocation). Your script can be
>>> quite short:
>>>
>>> #!/bin/sh
>>> mpirun my_mpi_application
>>>
>>> Note that OMPI will automatically figure out how many cpu's are in
>>> your SLURM allocation, so you don't need to specify "-np X". Hence,
>>> you can run the same script without modification no matter how many
>>> cpus/nodes you get from SLURM.
>>>
>>> It's on the long-term plan to get "srun -n X my_mpi_application"
>>> model to work; it just hasn't bubbled up high enough in the priority
>>> stack yet... :-\
>>>
>>> On Jun 20, 2007, at 1:59 PM, Jeff Pummill wrote:
>>>
>>>> Just started working with OpenMPI / SLURM combo this morning. I can
>>>> successfully launch this job from the command line and it runs to
>>>> completion, but when launching from SLURM they hang.
>>>>
>>>> They appear to just sit with no load apparent on the compute nodes
>>>> even though SLURM indicates they are running...
>>>>
>>>> [jpummil_at_trillion ~]$ sinfo -l
>>>> Wed Jun 20 12:32:29 2007
>>>> PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT SHARE GROUPS
>>>> NODES STATE NODELIST
>>>> debug* up infinite 1-infinite no no all
>>>> 8 allocated compute-1-[1-8]
>>>> debug* up infinite 1-infinite no no all
>>>> 1 idle compute-1-0
>>>>
>>>> [jpummil_at_trillion ~]$ squeue -l
>>>> Wed Jun 20 12:32:20 2007
>>>> JOBID PARTITION NAME USER STATE TIME TIMELIMIT
>>>> NODES NODELIST(REASON)
>>>> 79 debug mpirun jpummil RUNNING 5:27
>>>> UNLIMITED 2 compute-1-[1-2]
>>>> 78 debug mpirun jpummil RUNNING 5:58
>>>> UNLIMITED 2 compute-1-[3-4]
>>>> 77 debug mpirun jpummil RUNNING 7:00
>>>> UNLIMITED 2 compute-1-[5-6]
>>>> 74 debug mpirun jpummil RUNNING 11:39
>>>> UNLIMITED 2 compute-1-[7-8]
>>>>
>>>> Are there any known issues of this nature involving OpenMPI and SLURM?
>>>>
>>>> Thanks!
>>>>
>>>> Jeff F. Pummill
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>