Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine
From: Reuti (reuti_at_[hidden])
Date: 2012-01-29 17:44:44


Am 27.01.2012 um 23:19 schrieb Tom Bryan:

> I am in the process of setting up a grid engine (SGE) cluster for running
> Open MPI applications. I'll detail the set up below, but my current problem
> is that this call to Span_multiple never seems to return.
>
> // Spawn all of the children processes.
> _intercomm = MPI::COMM_WORLD.Spawn_multiple( _nProc,
> const_cast<const char **>(_command),
> const_cast<const char ***>(_arg),
> _maxProc, _info, 0, errCode );
>
> I'm new to both SGE and MPI, which is making this problem difficult for me
> to troubleshoot.
>
> I can schedule simple (non-MPI) jobs on the SGE grid with qsub.
>
> I can use qsub to schedule multiple copies of a simple Hello World type of
> application using mpirun spawn the processes in a script like this:
> #!/bin/sh
> #
> #$ -S /bin/sh
> #$ -V
> #$ -pe orte 4
> #$ -cwd
> #$ -j yes
> export LD_LIBRARY_PATH=/${VXR_STATIC}/openmpi-1.5.4/lib
> mpirun -np 4 ./mpihello $*
>
> That seems to work. The processes report the hostname where they were run,
> and they appear to be scheduled on different machines in my SGE grid.

According to the granted nodes, which you can check in SGE with:

$ qstat -g t

you compiled Open MPI --with-sge I assume, as the above is working - fine.

> The problem is with a program, mpitest, that tries to use Spawn_multiple to
> launch multiple child processes. The script that I submit to the SGE grid
> looks like this:
> #!/bin/sh
> #
> #$ -S /bin/sh
> #$ -V
> #$ -pe orte 1-

This number should match the processes you want to start plus one the master. Otherwise SGE might refuse to start a process on a remote node if you have set up a tight integration.

Suppose you want to start 4 additional tasks, you would need 5 in total from SGE.

> #$ -cwd
> #$ -j yes
> export LD_LIBRARY_PATH=/${VXR_STATIC}/openmpi-1.5.4/lib
> ./mpitest $*
>
> The mpitest program is the one that calls Spawn_multiple. In this case, it
> just tries to run multiple copies of itself. If I restrict my SGE

I never used spawn_mutiple, but isn't it necessary to start it with mpiexec too and call MPI_Init?

$ mpiexec ./mpitest -np 1

to override the detected slots by the tight integration into SGE. Otherwise it might be running only as a serial one. The additional 4 spawned processes can then be added inside your application.

The line to initialize MPI:

if( MPI::Init( MPI::THREAD_MULTIPLE ) != MPI::THREAD_MULTIPLE )
...

I replaced the complete if... by a plain MPI::Init(); and get a suitable output (see attached, qsub -pe openmpi 4 and changed _nProc to 3) in a tight integration into SGE.

NB: What is MPI::Init( MPI::THREAD_MULTIPLE ) supposed to do, output a feature of MPI?

> configuration so that the orte parallel environment has to run all jobs on a
> single host, then mpitest runs to completion, spawning 4 "child" processes
> that are scheduled via SGE to run on the same host as the root process. The
> processes Send and Recv some messages, and the program exits.

Is it for an actual application where you need this feature? In the MPI documentation it's discouraged to start it this way for performance reasons.

> If I permit SGE to schedule jobs on multiple hosts, then the child processes
> appear to be scheduled and launched. (That is, I can see them as children
> of the sge_execd and sge_shepherd processes on various machines.) But the
> original call to Spawn_multiple doesn't appear to return in the root
> mpitest. I assume that there's some problem setting up the communications
> channel among the different processes, but it's possible that my mpitest
> code is just buggy. I already tried disabling the firewall on all of the
> machines. I'm not sure how else to get useful debug information at this
> stage of the troubleshooting.

Anyway:

do you see on the master node of the parallel job in:

$ ps -e f --cols=500

(f w/o -) the `qrsh -inherit` startups like:

 2861 ? Sl 10:17 /usr/sge/bin/lx24-x86/sge_execd
22294 ? S 0:00 \_ sge_shepherd-3770 -bg
22296 ? Ss 0:00 \_ /bin/sh /var/spool/sge/pc15381/job_scripts/3770
22297 ? S 0:00 \_ mpiexec -np 1 ./Mpitest
22298 ? R 0:07 \_ ./Mpitest
22299 ? Sl 0:00 \_ /usr/sge/bin/lx24-x86/qrsh -inherit -nostdin -V pc15370 orted -mca ess env -mca orte_ess_jobid 1491402752 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri "1491402752.0;tcp://192.168.151.101:41663"
22302 ? S 0:00 \_ /home/reuti/mpitest/Mpitest --child

and on the other side:

27422 ? Sl 3:48 /usr/sge/bin/lx24-x86/sge_execd
 9900 ? Sl 0:00 \_ sge_shepherd-3770 -bg
 9901 ? Ss 0:00 \_ /usr/sge/utilbin/lx24-x86/qrsh_starter /var/spool/sge/pc15370/active_jobs/3770.1/1.pc15370
 9908 ? S 0:00 \_ orted -mca ess env -mca orte_ess_jobid 1491402752 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri 1491402752.0;tcp://192.168.151.101:41663
 9909 ? R 0:02 \_ /home/reuti/mpitest/Mpitest --child
 9910 ? R 0:02 \_ /home/reuti/mpitest/Mpitest --child

-- Reuti

> It would be great if someone could look at the attached code and just let me
> know whether what I'm doing is horribly incorrect. If it should work, then
> I can focus on systems and SGE configuration issues. If the code is broken
> and really shouldn't work, then I'd like to fix that first, of course.
>
> Thanks,
> ---Tom
>
>
> <mpitest.tgz>_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users



  • application/octet-stream attachment: output