Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-01-31 15:25:47


On Jan 31, 2012, at 12:58 PM, Reuti wrote:

>
> Am 31.01.2012 um 20:38 schrieb Ralph Castain:
>
>> Not sure I fully grok this thread, but will try to provide an answer.
>>
>> When you start a singleton, it spawns off a daemon that is the equivalent of "mpirun". This daemon is created for the express purpose of allowing the singleton to use MPI dynamics like comm_spawn - without it, the singleton would be unable to execute those functions.
>>
>> The first thing the daemon does is read the local allocation, using the same methods as used by mpirun. So whatever allocation is present that mpirun would have read, the daemon will get. This includes hostfiles and SGE allocations.
>
> So it should honor also the default hostfile of Open MPI if running outside of SGE, i.e. from the command line?

Yes

>
>
>> The exception to this is when the singleton gets started in an altered environment - e.g., if SGE changes the environmental variables when launching the singleton process. We see this in some resource managers - you can get an allocation of N nodes, but when you launch a job, the envar in that job only indicates the number of nodes actually running processes in that job. In such a situation, the daemon will see the altered value as its "allocation", potentially causing confusion.
>
> Not sure whether I get it right. When I launch the same application with:
>
> "mpiexec -np1 ./Mpitest" (and get an allocation of 2+2 on the two machines):
>
> 27422 ? Sl 4:12 /usr/sge/bin/lx24-x86/sge_execd
> 9504 ? S 0:00 \_ sge_shepherd-3791 -bg
> 9506 ? Ss 0:00 \_ /bin/sh /var/spool/sge/pc15370/job_scripts/3791
> 9507 ? S 0:00 \_ mpiexec -np 1 ./Mpitest
> 9508 ? R 0:07 \_ ./Mpitest
> 9509 ? Sl 0:00 \_ /usr/sge/bin/lx24-x86/qrsh -inherit -nostdin -V pc15381 orted -mca
> 9513 ? S 0:00 \_ /home/reuti/mpitest/Mpitest --child
>
> 2861 ? Sl 10:47 /usr/sge/bin/lx24-x86/sge_execd
> 25434 ? Sl 0:00 \_ sge_shepherd-3791 -bg
> 25436 ? Ss 0:00 \_ /usr/sge/utilbin/lx24-x86/qrsh_starter /var/spool/sge/pc15381/active_jobs/3791.1/1.pc15381
> 25444 ? S 0:00 \_ orted -mca ess env -mca orte_ess_jobid 821952512 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri
> 25447 ? S 0:01 \_ /home/reuti/mpitest/Mpitest --child
> 25448 ? S 0:01 \_ /home/reuti/mpitest/Mpitest --child
>
> This is what I expect (main + 1 child, other node gets 2 children). Now I launch the singleton instead (nothing changed besides this, still 2+2 granted):
>
> "./Mpitest" and get:
>
> 27422 ? Sl 4:12 /usr/sge/bin/lx24-x86/sge_execd
> 9546 ? S 0:00 \_ sge_shepherd-3793 -bg
> 9548 ? Ss 0:00 \_ /bin/sh /var/spool/sge/pc15370/job_scripts/3793
> 9549 ? R 0:00 \_ ./Mpitest
> 9550 ? Ss 0:00 \_ orted --hnp --set-sid --report-uri 6 --singleton-died-pipe 7
> 9551 ? Sl 0:00 \_ /usr/sge/bin/lx24-x86/qrsh -inherit -nostdin -V pc15381 orted
> 9554 ? S 0:00 \_ /home/reuti/mpitest/Mpitest --child
> 9555 ? S 0:00 \_ /home/reuti/mpitest/Mpitest --child
>
> 2861 ? Sl 10:47 /usr/sge/bin/lx24-x86/sge_execd
> 25494 ? Sl 0:00 \_ sge_shepherd-3793 -bg
> 25495 ? Ss 0:00 \_ /usr/sge/utilbin/lx24-x86/qrsh_starter /var/spool/sge/pc15381/active_jobs/3793.1/1.pc15381
> 25502 ? S 0:00 \_ orted -mca ess env -mca orte_ess_jobid 814940160 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri
> 25503 ? S 0:00 \_ /home/reuti/mpitest/Mpitest --child
>
> Only one child is going to the other node. The environment is the same in both cases. Is this the correct behavior?

We probably aren't correctly marking the original singleton on that node, and so the mapper thinks there are still two slots available on the original node.

>
> -- Reuti
>
>
>> For this reason, I generally recommend that you run dynamic applications using miprun when operating in RM-managed environments to avoid confusion. Or at least use "printenv" to check that the envars are going to be right before trying to start from a singleton.
>>
>> HTH
>> Ralph
>>
>> On Jan 31, 2012, at 12:19 PM, Reuti wrote:
>>
>>> Am 31.01.2012 um 20:12 schrieb Jeff Squyres:
>>>
>>>> I only noticed after the fact that Tom is also here at Cisco (it's a big company, after all :-) ).
>>>>
>>>> I've contacted him using our proprietary super-secret Cisco handshake (i.e., the internal phone network); I'll see if I can figure out the issues off-list.
>>>
>>> But I would be interested in a statement about a hostlist for singleton startups. Or whether it's honoring the tight integration nodes more by accident than by design. And as said: I see a wrong allocation, as the initial ./Mpitest doesn't count as process. I get a 3+1 allocation instead of 2+2 (what is granted by SGE). If started with "mpiexec -np 1 ./Mpitest" all is fine.
>>>
>>> -- Reuti
>>>
>>>
>>>> On Jan 31, 2012, at 1:08 PM, Dave Love wrote:
>>>>
>>>>> Reuti <reuti_at_[hidden]> writes:
>>>>>
>>>>>> Maybe it's a side effect of a tight integration that it would start on
>>>>>> the correct nodes (but I face an incorrect allocation of slots and an
>>>>>> error message at the end if started without mpiexec), as in this case
>>>>>> it has no command line option for the hostfile. How to get the
>>>>>> requested nodes if started from the command line?
>>>>>
>>>>> Yes, I wouldn't expect it to work without mpirun/mpiexec and, of course,
>>>>> I basically agree with Reuti about the rest.
>>>>>
>>>>> If there is an actual SGE problem or need for an enhancement, though,
>>>>> please file it per https://arc.liv.ac.uk/trac/SGE#mail
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquyres_at_[hidden]
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users