Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts
From: Terry Dontje (terry.dontje_at_[hidden])
Date: 2010-11-15 14:23:22


On 11/15/2010 02:11 PM, Reuti wrote:
> Just to give my understanding of the problem:
>
> Am 15.11.2010 um 19:57 schrieb Terry Dontje:
>
>> On 11/15/2010 11:08 AM, Chris Jewell wrote:
>>>> Sorry, I am still trying to grok all your email as what the problem you
>>>> are trying to solve. So is the issue is trying to have two jobs having
>>>> processes on the same node be able to bind there processes on different
>>>> resources. Like core 1 for the first job and core 2 and 3 for the 2nd job?
>>>>
>>>> --td
>>>>
>>> That's exactly it. Each MPI process needs to be bound to 1 processor in a way that reflects GE's slot allocation scheme.
>>>
>>>
>> I actually don't think that I got it. So you give two cases:
>>
>> Case 1:
>> $ qsub -pe mpi 8 -binding pe linear:1 myScript.com
>>
>> and my pe_hostfile looks like:
>>
>> exec6.cluster.stats.local 2
>> batch.q_at_exec6.cluster.stats.local
>> 0,1
> Shouldn't here two cores be reserved for exec6 as it got two slots?
>
>
That's what I was wondering.
>> exec1.cluster.stats.local 1
>> batch.q_at_exec1.cluster.stats.local
>> 0,1
>> exec7.cluster.stats.local 1
>> batch.q_at_exec7.cluster.stats.local
>> 0,1
>> exec5.cluster.stats.local 1
>> batch.q_at_exec5.cluster.stats.local
>> 0,1
>> exec4.cluster.stats.local 1
>> batch.q_at_exec4.cluster.stats.local
>> 0,1
>> exec3.cluster.stats.local 1
>> batch.q_at_exec3.cluster.stats.local
>> 0,1
>> exec2.cluster.stats.local 1
>> batch.q_at_exec2.cluster.stats.local
>> 0,1
>>
>>
>> Case 2:
>> Notice that, because I have specified the -binding pe linear:1, each execution node binds processes for the job_id to one core. If I have -binding pe linear:2, I get:
>>
>> exec6.cluster.stats.local 2
>> batch.q_at_exec6.cluster.stats.local
>> 0,1:0,2
>> exec1.cluster.stats.local 1
>> batch.q_at_exec1.cluster.stats.local
>> 0,1:0,2
>> exec7.cluster.stats.local 1
>> batch.q_at_exec7.cluster.stats.local
>> 0,1:0,2
>> exec4.cluster.stats.local 1
>> batch.q_at_exec4.cluster.stats.local
>> 0,1:0,2
>> exec3.cluster.stats.local 1
>> batch.q_at_exec3.cluster.stats.local
>> 0,1:0,2
>> exec2.cluster.stats.local 1
>> batch.q_at_exec2.cluster.stats.local
>> 0,1:0,2
>> exec5.cluster.stats.local 1
>> batch.q_at_exec5.cluster.stats.local
>> 0,1:0,2
>>
>> Is your complaint really the fact that exec6 has been allocated two slots but there seems to only be one slot worth of resources allocated
> All are wrong except exec6. They should only get one core assigned.
>
Huh? I would have thought exec6 would get 4 cores and the rest are correct.

--td

> -- Reuti
>
>
>> to it (ie in case one exec6 only has 1 core and case 2 it has two where maybe you'd expect 2 and 4 cores allocated respectively)?
>>
>> --
>> <Mail-Anhang.gif>
>> Terry D. Dontje | Principal Software Engineer
>> Developer Tools Engineering | +1.781.442.2631
>> Oracle - Performance Technologies
>> 95 Network Drive, Burlington, MA 01803
>> Email terry.dontje_at_[hidden]
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>



picture