Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine
From: Reuti (reuti_at_[hidden])
Date: 2012-01-31 09:26:33


Am 31.01.2012 um 05:33 schrieb Tom Bryan:

>> Suppose you want to start 4 additional tasks, you would need 5 in total from
>> SGE.
>
> OK, thanks. I'll try other values.

BTW: there is a setting in the PE definition to allow one addititonal task:

$ qconf -sp openmpi
...
job_is_first_task FALSE

This is useful, in case the master task does only collect the results and doesn't put any load on the machine. For conventional MPI applications it's set to "true" though.

>>> #$ -cwd
>>> #$ -j yes
>>> export LD_LIBRARY_PATH=/${VXR_STATIC}/openmpi-1.5.4/lib
>>> ./mpitest $*
>>>
>>> The mpitest program is the one that calls Spawn_multiple. In this case, it
>>> just tries to run multiple copies of itself. If I restrict my SGE
>>
>> I never used spawn_mutiple, but isn't it necessary to start it with mpiexec
>> too and call MPI_Init?
>>
>> $ mpiexec ./mpitest -np 1
>
> I don't think so.

In the book "Using MPI-2 by William Gropp at el." they use it in chapter 7.2.2/page 235 this way, although it's indeed stated in the MPI-2.2 standard on page 329 to create a singleton MPI environment if the application could find the necessary information (i.e. wasn't started by mpiexec).

Maybe it's a side effect of a tight integration that it would start on the correct nodes (but I face an incorrect allocation of slots and an error message at the end if started without mpiexec), as in this case it has no command line option for the hostfile. How to get the requested nodes if started from the command line?

Maybe someone from the Open MPI team can clarify the intended behavior in this case.

> In any case, when I restrict the SGE grid to run all of
> my orte parallel environment jobs on one machine, the application runs fine.
> I only have problems if one or more of the spawned children gets scheduled
> to another node.
>
>> to override the detected slots by the tight integration into SGE. Otherwise it
>> might be running only as a serial one. The additional 4 spawned processes can
>> then be added inside your application.
>>
>> The line to initialize MPI:
>>
>> if( MPI::Init( MPI::THREAD_MULTIPLE ) != MPI::THREAD_MULTIPLE )
>> ...
>>
>> I replaced the complete if... by a plain MPI::Init(); and get a suitable
>> output (see attached, qsub -pe openmpi 4 and changed _nProc to 3) in a tight
>> integration into SGE.
>>
>> NB: What is MPI::Init( MPI::THREAD_MULTIPLE ) supposed to do, output a feature

Okay, typo - the _thread is missing.

>> of MPI?
>
> Well, I'm new to MPI, so I'm not sure. The program was actually written by
> a co-worker. I think that it's supposed to set up a bunch of things and
> also verify that our build has the requested level of thread support.

Threads have nothing to do with comm_spawn. Their support is necessary to combine MPI with OpenMP or any other thread library. I couldn't use it initially as I haven't compiled it with --enable-mpi-threads. A plain MPI::Init(); is sufficient here (thread support won't hurt though).

> My co-worker clarified today that he actually had this exact code working
> last year on a test cluster that we set up. Now we're trying to put
> together a production cluster with the latest version of Open MPI and SGE
> (Son of Grid Engine), but Mpitest is now hanging as described in my first
> e-mail.

For me it's not hanging. Did you try the alternative startup using mpiexec?

Aha - BTW: I use 1.4.4

-- Reuti

>> Is it for an actual application where you need this feature? In the MPI
>> documentation it's discouraged to start it this way for performance reasons.
>
> For our use, yes, spawn_multiple makes sense. We won't be spawning lots and
> lots of jobs in quick succession. We're using MPI as an robust way to get
> IPC as we spawn multiple child processes while using SGE to help us with
> load balancing our compute nodes.
>
>> Anyway:
>> do you see on the master node of the parallel job in:
>
> Yes, I should have included that kind of output. I'll have to run it again
> with the cols option, but I used pstree to see that I have mpitest --child
> processes as children of orted by way of sge_shepherd and sge_execd.
>
> Thanks,
> ---Tom
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users