Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Pummill (jpummil_at_[hidden])
Date: 2007-06-21 12:33:03


Thanks for the info Jeff! All of my "test" nodes are temporarily busy,
but I should be able to play with this some more tomorrow.

I'll update the post if I have more questions or find any additional
tips ;-)

Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas
Fayetteville, Arkansas 72701
(479) 575 - 4590
http://hpc.uark.edu

"A supercomputer is a device for turning compute-bound
problems into I/O-bound problems." -Seymour Cray

Jeff Squyres wrote:
> Ick; I'm surprised that we don't have this info on the FAQ. I'll try
> to rectify that shortly.
>
> How are you launching your jobs through SLURM? OMPI currently does
> not support the "srun -n X my_mpi_application" model for launching
> MPI jobs. You must either use the -A option to srun (i.e., get an
> interactive SLURM allocation) or use the -b option (submit a script
> that runs on the first node in the allocation). Your script can be
> quite short:
>
> #!/bin/sh
> mpirun my_mpi_application
>
> Note that OMPI will automatically figure out how many cpu's are in
> your SLURM allocation, so you don't need to specify "-np X". Hence,
> you can run the same script without modification no matter how many
> cpus/nodes you get from SLURM.
>
> It's on the long-term plan to get "srun -n X my_mpi_application"
> model to work; it just hasn't bubbled up high enough in the priority
> stack yet... :-\
>
>
> On Jun 20, 2007, at 1:59 PM, Jeff Pummill wrote:
>
>
>> Just started working with OpenMPI / SLURM combo this morning. I can
>> successfully launch this job from the command line and it runs to
>> completion, but when launching from SLURM they hang.
>>
>> They appear to just sit with no load apparent on the compute nodes
>> even though SLURM indicates they are running...
>>
>> [jpummil_at_trillion ~]$ sinfo -l
>> Wed Jun 20 12:32:29 2007
>> PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT SHARE GROUPS
>> NODES STATE NODELIST
>> debug* up infinite 1-infinite no no all
>> 8 allocated compute-1-[1-8]
>> debug* up infinite 1-infinite no no all
>> 1 idle compute-1-0
>>
>> [jpummil_at_trillion ~]$ squeue -l
>> Wed Jun 20 12:32:20 2007
>> JOBID PARTITION NAME USER STATE TIME TIMELIMIT
>> NODES NODELIST(REASON)
>> 79 debug mpirun jpummil RUNNING 5:27
>> UNLIMITED 2 compute-1-[1-2]
>> 78 debug mpirun jpummil RUNNING 5:58
>> UNLIMITED 2 compute-1-[3-4]
>> 77 debug mpirun jpummil RUNNING 7:00
>> UNLIMITED 2 compute-1-[5-6]
>> 74 debug mpirun jpummil RUNNING 11:39
>> UNLIMITED 2 compute-1-[7-8]
>>
>> Are there any known issues of this nature involving OpenMPI and SLURM?
>>
>> Thanks!
>>
>> Jeff F. Pummill
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>