Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts
From: Terry Dontje (terry.dontje_at_[hidden])
Date: 2010-11-16 05:55:55


On 11/16/2010 04:26 AM, Chris Jewell wrote:
> Hi all,
>
>> On 11/15/2010 02:11 PM, Reuti wrote:
>>> Just to give my understanding of the problem:
>>>>>> Sorry, I am still trying to grok all your email as what the problem you
>>>>>> are trying to solve. So is the issue is trying to have two jobs having
>>>>>> processes on the same node be able to bind there processes on different
>>>>>> resources. Like core 1 for the first job and core 2 and 3 for the 2nd job?
>>>>>>
>>>>>> --td
>> You can't get 2 slots on a machine, as it's limited by the core count to one here, so such a slot allocation shouldn't occur at all.
> So to clarify, the current -binding<binding_strategy>:<binding_amount> allocates binding_amount cores to each sge_shepherd process associated with a job_id. There appears to be only one sge_shepherd process per job_id per execution node, so all child processes run on these allocated cores. This is irrespective of the number of slots allocated to the node.
I believe the above is correct.
> I agree with Reuti that the binding_amount parameter should be a maximum number of bound cores per node, with the actual number determined by the number of slots allocated per node. FWIW, an alternative approach might be to have another binding_type ('slot', say) that automatically allocated one core per slot.
That might be correct, I've put in a question to someone who should know.
> Of course, a complex situation might arise if a user submits a combined MPI/multithreaded job, but then I guess we're into the realm of setting allocation_rule.
Yes, that would get ugly.
> Is it going to be worth looking at creating a patch for this? I don't know much of the internals of SGE -- would it be hard work to do? I've not that much time to dedicate towards it, but I could put some effort in if necessary...
>
Is the patch you're wanting is for a "slot" binding_type?

-- 
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>



picture