Thanks for the info Tim. That worked perfectly.

And I now have the OpenMPI FAQ page bookmarked ;-)


Jeff F. Pummill



Tim Prins wrote:
Hi Jeff,

If you submit a batch script, there is no need to do a salloc. 

See the Open MPI FAQ for details on how to run on SLURM:
http://www.open-mpi.org/faq/?category=slurm

Hope this helps.

Tim

On Wednesday 27 June 2007 14:21, Jeff Pummill wrote:
  
Hey Jeff,

Finally got my test nodes back and was looking at the info you sent. On
the SLURM page, it states the following:

*Open MPI* <http://www.open-mpi.org/> relies upon SLURM to allocate
resources for the job and then mpirun to initiate the tasks. When using
salloc command, mpirun's -nolocal option is recommended. For example:

$ salloc -n4 sh    # allocates 4 processors and spawns shell for job

    
mpirun -np 4 -nolocal a.out
exit          # exits shell spawned by initial salloc command
      
You are saying that I need to use the slurm salloc, then pass SLURM a
script? Or could I just add it all into the script? Fro eaample:

#!/bin/sh
salloc -n4
mpirun my_mpi_application

Then, run with srun -b myscript.sh


Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas
Fayetteville, Arkansas 72701
(479) 575 - 4590
http://hpc.uark.edu

"A supercomputer is a device for turning compute-bound
problems into I/O-bound problems." -Seymour Cray

Jeff Squyres wrote:
    
Ick; I'm surprised that we don't have this info on the FAQ.  I'll try
to rectify that shortly.

How are you launching your jobs through SLURM?  OMPI currently does
not support the "srun -n X my_mpi_application" model for launching
MPI jobs.  You must either use the -A option to srun (i.e., get an
interactive SLURM allocation) or use the -b option (submit a script
that runs on the first node in the allocation).  Your script can be
quite short:

#!/bin/sh
mpirun my_mpi_application

Note that OMPI will automatically figure out how many cpu's are in
your SLURM allocation, so you don't need to specify "-np X".  Hence,
you can run the same script without modification no matter how many
cpus/nodes you get from SLURM.

It's on the long-term plan to get "srun -n X my_mpi_application"
model to work; it just hasn't bubbled up high enough in the priority
stack yet... :-\

On Jun 20, 2007, at 1:59 PM, Jeff Pummill wrote:
      
Just started working with OpenMPI / SLURM combo this morning. I can
successfully launch this job from the command line and it runs to
completion, but when launching from SLURM they hang.

They appear to just sit with no load apparent on the compute nodes
even though SLURM indicates they are running...

[jpummil@trillion ~]$ sinfo -l
Wed Jun 20 12:32:29 2007
PARTITION AVAIL  TIMELIMIT   JOB_SIZE ROOT SHARE     GROUPS
NODES       STATE NODELIST
debug*       up   infinite 1-infinite   no    no        all
8   allocated compute-1-[1-8]
debug*       up   infinite 1-infinite   no    no        all
1        idle compute-1-0

[jpummil@trillion ~]$ squeue -l
Wed Jun 20 12:32:20 2007
  JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT
NODES NODELIST(REASON)
     79     debug   mpirun  jpummil  RUNNING       5:27
UNLIMITED      2 compute-1-[1-2]
     78     debug   mpirun  jpummil  RUNNING       5:58
UNLIMITED      2 compute-1-[3-4]
     77     debug   mpirun  jpummil  RUNNING       7:00
UNLIMITED      2 compute-1-[5-6]
     74     debug   mpirun  jpummil  RUNNING      11:39
UNLIMITED      2 compute-1-[7-8]

Are there any known issues of this nature involving OpenMPI and SLURM?

Thanks!

Jeff F. Pummill

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
        
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users