Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts
From: Chris Jewell (chris.jewell_at_[hidden])
Date: 2010-11-16 04:26:19


Hi all,

> On 11/15/2010 02:11 PM, Reuti wrote:
>> Just to give my understanding of the problem:
>>>
>>>>> Sorry, I am still trying to grok all your email as what the problem you
>>>>> are trying to solve. So is the issue is trying to have two jobs having
>>>>> processes on the same node be able to bind there processes on different
>>>>> resources. Like core 1 for the first job and core 2 and 3 for the 2nd job?
>>>>>
>>>>> --td
> You can't get 2 slots on a machine, as it's limited by the core count to one here, so such a slot allocation shouldn't occur at all.

So to clarify, the current -binding <binding_strategy>:<binding_amount> allocates binding_amount cores to each sge_shepherd process associated with a job_id. There appears to be only one sge_shepherd process per job_id per execution node, so all child processes run on these allocated cores. This is irrespective of the number of slots allocated to the node.

I agree with Reuti that the binding_amount parameter should be a maximum number of bound cores per node, with the actual number determined by the number of slots allocated per node. FWIW, an alternative approach might be to have another binding_type ('slot', say) that automatically allocated one core per slot.

Of course, a complex situation might arise if a user submits a combined MPI/multithreaded job, but then I guess we're into the realm of setting allocation_rule.

Is it going to be worth looking at creating a patch for this? I don't know much of the internals of SGE -- would it be hard work to do? I've not that much time to dedicate towards it, but I could put some effort in if necessary...

Chris
 

--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778