Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-02-05 11:01:54


On Feb 5, 2012, at 6:51 AM, Reuti wrote:

> Hi,
>
>>> Not sure whether I get it right. When I launch the same application with:
>>>
>>> "mpiexec -np1 ./Mpitest" (and get an allocation of 2+2 on the two machines):
>>>
>>> 27422 ? Sl 4:12 /usr/sge/bin/lx24-x86/sge_execd
>>> 9504 ? S 0:00 \_ sge_shepherd-3791 -bg
>>> 9506 ? Ss 0:00 \_ /bin/sh /var/spool/sge/pc15370/job_scripts/3791
>>> 9507 ? S 0:00 \_ mpiexec -np 1 ./Mpitest
>>> 9508 ? R 0:07 \_ ./Mpitest
>>> 9509 ? Sl 0:00 \_ /usr/sge/bin/lx24-x86/qrsh -inherit -nostdin -V pc15381 orted -mca
>>> 9513 ? S 0:00 \_ /home/reuti/mpitest/Mpitest --child
>>>
>>> 2861 ? Sl 10:47 /usr/sge/bin/lx24-x86/sge_execd
>>> 25434 ? Sl 0:00 \_ sge_shepherd-3791 -bg
>>> 25436 ? Ss 0:00 \_ /usr/sge/utilbin/lx24-x86/qrsh_starter /var/spool/sge/pc15381/active_jobs/3791.1/1.pc15381
>>> 25444 ? S 0:00 \_ orted -mca ess env -mca orte_ess_jobid 821952512 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri
>>> 25447 ? S 0:01 \_ /home/reuti/mpitest/Mpitest --child
>>> 25448 ? S 0:01 \_ /home/reuti/mpitest/Mpitest --child
>>>
>>> This is what I expect (main + 1 child, other node gets 2 children). Now I launch the singleton instead (nothing changed besides this, still 2+2 granted):
>>>
>>> "./Mpitest" and get:
>>>
>>> 27422 ? Sl 4:12 /usr/sge/bin/lx24-x86/sge_execd
>>> 9546 ? S 0:00 \_ sge_shepherd-3793 -bg
>>> 9548 ? Ss 0:00 \_ /bin/sh /var/spool/sge/pc15370/job_scripts/3793
>>> 9549 ? R 0:00 \_ ./Mpitest
>>> 9550 ? Ss 0:00 \_ orted --hnp --set-sid --report-uri 6 --singleton-died-pipe 7
>>> 9551 ? Sl 0:00 \_ /usr/sge/bin/lx24-x86/qrsh -inherit -nostdin -V pc15381 orted
>>> 9554 ? S 0:00 \_ /home/reuti/mpitest/Mpitest --child
>>> 9555 ? S 0:00 \_ /home/reuti/mpitest/Mpitest --child
>>>
>>> 2861 ? Sl 10:47 /usr/sge/bin/lx24-x86/sge_execd
>>> 25494 ? Sl 0:00 \_ sge_shepherd-3793 -bg
>>> 25495 ? Ss 0:00 \_ /usr/sge/utilbin/lx24-x86/qrsh_starter /var/spool/sge/pc15381/active_jobs/3793.1/1.pc15381
>>> 25502 ? S 0:00 \_ orted -mca ess env -mca orte_ess_jobid 814940160 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri
>>> 25503 ? S 0:00 \_ /home/reuti/mpitest/Mpitest --child
>>>
>>> Only one child is going to the other node. The environment is the same in both cases. Is this the correct behavior?
>>
>>
>> We probably aren't correctly marking the original singleton on that node, and so the mapper thinks there are still two slots available on the original node.
>
> Was there any further discussion about the different slot allocations between the two startup methods off-list?

Not really - it isn't much of a priority for us, to be honest.

>
> One could even argue, it's the intended way it is right now:
>
>
> - you have an MPI style application (rank0 is doing work) => use mpiexec
>
> Corresponding SGE setting: "job_is_first_task true" in the PE
>
>
> - you have a true master/slave application and the master is not doing any work => start it as a singleton
>
> Corresponding SGE setting: "job_is_first_task false" in the PE
>
>
> This would then be worth to be noted somewhere in the FAQ.

You make a good point, and I suppose it could be argued both ways. However, the intended behavior was to count the singleton against the allocation since it is an MPI proc, and we can't know if it will be doing work or not. Clearly, we aren't doing that correctly.

Like I said, though, it isn't a priority, so I doubt I'll get around to resolving it any time soon.

>
> (I couldn't compile the Mpitest with MPICH2 to check their behavior, it chocks on some overloading operators.)
>
> -- Reuti
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users