Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-01-31 14:38:15


Not sure I fully grok this thread, but will try to provide an answer.

When you start a singleton, it spawns off a daemon that is the equivalent of "mpirun". This daemon is created for the express purpose of allowing the singleton to use MPI dynamics like comm_spawn - without it, the singleton would be unable to execute those functions.

The first thing the daemon does is read the local allocation, using the same methods as used by mpirun. So whatever allocation is present that mpirun would have read, the daemon will get. This includes hostfiles and SGE allocations.

The exception to this is when the singleton gets started in an altered environment - e.g., if SGE changes the environmental variables when launching the singleton process. We see this in some resource managers - you can get an allocation of N nodes, but when you launch a job, the envar in that job only indicates the number of nodes actually running processes in that job. In such a situation, the daemon will see the altered value as its "allocation", potentially causing confusion.

For this reason, I generally recommend that you run dynamic applications using miprun when operating in RM-managed environments to avoid confusion. Or at least use "printenv" to check that the envars are going to be right before trying to start from a singleton.

HTH
Ralph

On Jan 31, 2012, at 12:19 PM, Reuti wrote:

> Am 31.01.2012 um 20:12 schrieb Jeff Squyres:
>
>> I only noticed after the fact that Tom is also here at Cisco (it's a big company, after all :-) ).
>>
>> I've contacted him using our proprietary super-secret Cisco handshake (i.e., the internal phone network); I'll see if I can figure out the issues off-list.
>
> But I would be interested in a statement about a hostlist for singleton startups. Or whether it's honoring the tight integration nodes more by accident than by design. And as said: I see a wrong allocation, as the initial ./Mpitest doesn't count as process. I get a 3+1 allocation instead of 2+2 (what is granted by SGE). If started with "mpiexec -np 1 ./Mpitest" all is fine.
>
> -- Reuti
>
>
>> On Jan 31, 2012, at 1:08 PM, Dave Love wrote:
>>
>>> Reuti <reuti_at_[hidden]> writes:
>>>
>>>> Maybe it's a side effect of a tight integration that it would start on
>>>> the correct nodes (but I face an incorrect allocation of slots and an
>>>> error message at the end if started without mpiexec), as in this case
>>>> it has no command line option for the hostfile. How to get the
>>>> requested nodes if started from the command line?
>>>
>>> Yes, I wouldn't expect it to work without mpirun/mpiexec and, of course,
>>> I basically agree with Reuti about the rest.
>>>
>>> If there is an actual SGE problem or need for an enhancement, though,
>>> please file it per https://arc.liv.ac.uk/trac/SGE#mail
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users