Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts
From: Terry Dontje (terry.dontje_at_[hidden])
Date: 2010-11-15 14:23:22


On 11/15/2010 02:11 PM, Reuti wrote:
> Just to give my understanding of the problem:
>
> Am 15.11.2010 um 19:57 schrieb Terry Dontje:
>
>> On 11/15/2010 11:08 AM, Chris Jewell wrote:
>>>> Sorry, I am still trying to grok all your email as what the problem you
>>>> are trying to solve. So is the issue is trying to have two jobs having
>>>> processes on the same node be able to bind there processes on different
>>>> resources. Like core 1 for the first job and core 2 and 3 for the 2nd job?
>>>>
>>>> --td
>>>>
>>> That's exactly it. Each MPI process needs to be bound to 1 processor in a way that reflects GE's slot allocation scheme.
>>>
>>>
>> I actually don't think that I got it. So you give two cases:
>>
>> Case 1:
>> $ qsub -pe mpi 8 -binding pe linear:1 myScript.com
>>
>> and my pe_hostfile looks like:
>>
>> exec6.cluster.stats.local 2
>> batch.q_at_exec6.cluster.stats.local
>> 0,1
> Shouldn't here two cores be reserved for exec6 as it got two slots?
>
>
That's what I was wondering.
>> exec1.cluster.stats.local 1
>> batch.q_at_exec1.cluster.stats.local
>> 0,1
>> exec7.cluster.stats.local 1
>> batch.q_at_exec7.cluster.stats.local
>> 0,1
>> exec5.cluster.stats.local 1
>> batch.q_at_exec5.cluster.stats.local
>> 0,1
>> exec4.cluster.stats.local 1
>> batch.q_at_exec4.cluster.stats.local
>> 0,1
>> exec3.cluster.stats.local 1
>> batch.q_at_exec3.cluster.stats.local
>> 0,1
>> exec2.cluster.stats.local 1
>> batch.q_at_exec2.cluster.stats.local
>> 0,1
>>
>>
>> Case 2:
>> Notice that, because I have specified the -binding pe linear:1, each execution node binds processes for the job_id to one core. If I have -binding pe linear:2, I get:
>>
>> exec6.cluster.stats.local 2
>> batch.q_at_exec6.cluster.stats.local
>> 0,1:0,2
>> exec1.cluster.stats.local 1
>> batch.q_at_exec1.cluster.stats.local
>> 0,1:0,2
>> exec7.cluster.stats.local 1
>> batch.q_at_exec7.cluster.stats.local
>> 0,1:0,2
>> exec4.cluster.stats.local 1
>> batch.q_at_exec4.cluster.stats.local
>> 0,1:0,2
>> exec3.cluster.stats.local 1
>> batch.q_at_exec3.cluster.stats.local
>> 0,1:0,2
>> exec2.cluster.stats.local 1
>> batch.q_at_exec2.cluster.stats.local
>> 0,1:0,2
>> exec5.cluster.stats.local 1
>> batch.q_at_exec5.cluster.stats.local
>> 0,1:0,2
>>
>> Is your complaint really the fact that exec6 has been allocated two slots but there seems to only be one slot worth of resources allocated
> All are wrong except exec6. They should only get one core assigned.
>
Huh? I would have thought exec6 would get 4 cores and the rest are correct.

--td

> -- Reuti
>
>
>> to it (ie in case one exec6 only has 1 core and case 2 it has two where maybe you'd expect 2 and 4 cores allocated respectively)?
>>
>> --
>> <Mail-Anhang.gif>
>> Terry D. Dontje | Principal Software Engineer
>> Developer Tools Engineering | +1.781.442.2631
>> Oracle - Performance Technologies
>> 95 Network Drive, Burlington, MA 01803
>> Email terry.dontje_at_[hidden]
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>



picture