Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts
From: Reuti (reuti_at_[hidden])
Date: 2010-11-15 09:42:05


Am 15.11.2010 um 15:29 schrieb Chris Jewell:

> Hi,
>
>>> If, indeed, it is not possible currently to implement this type of core-binding in tightly integrated OpenMPI/GE, then a solution might lie in a custom script run in the parallel environment's 'start proc args'. This script would have to find out which slots are allocated where on the cluster, and write an OpenMPI rankfile.
>>
>> Exactly this should work.
>>
>> If you use "binding_instance" "pe" and reformat the information in the $PE_HOSTFILE to a "rankfile", it should work to get the desired allocation. Maybe you can share the script with this list once you got it working.
>
>
> As far as I can see, that's not going to work. This is because, exactly like "binding_instance" "set", for -binding pe linear:n you get n cores bound per node. This is easily verifiable by using a long job and examining the pe_hostfile. For example, I submit a job with:
>
> $ qsub -pe mpi 8 -binding pe linear:1 myScript.com
>
> and my pe_hostfile looks like:
>
> exec6.cluster.stats.local 2 batch.q_at_exec6.cluster.stats.local 0,1
> exec1.cluster.stats.local 1 batch.q_at_exec1.cluster.stats.local 0,1
> exec7.cluster.stats.local 1 batch.q_at_exec7.cluster.stats.local 0,1
> exec5.cluster.stats.local 1 batch.q_at_exec5.cluster.stats.local 0,1
> exec4.cluster.stats.local 1 batch.q_at_exec4.cluster.stats.local 0,1
> exec3.cluster.stats.local 1 batch.q_at_exec3.cluster.stats.local 0,1
> exec2.cluster.stats.local 1 batch.q_at_exec2.cluster.stats.local 0,1
>
> Notice that, because I have specified the -binding pe linear:1, each execution node binds processes for the job_id to one core. If I have -binding pe linear:2, I get:
>
> exec6.cluster.stats.local 2 batch.q_at_exec6.cluster.stats.local 0,1:0,2

So the cores 1 and 2 on socket 0 aren't free?

-- Reuti

> exec1.cluster.stats.local 1 batch.q_at_exec1.cluster.stats.local 0,1:0,2
> exec7.cluster.stats.local 1 batch.q_at_exec7.cluster.stats.local 0,1:0,2
> exec4.cluster.stats.local 1 batch.q_at_exec4.cluster.stats.local 0,1:0,2
> exec3.cluster.stats.local 1 batch.q_at_exec3.cluster.stats.local 0,1:0,2
> exec2.cluster.stats.local 1 batch.q_at_exec2.cluster.stats.local 0,1:0,2
> exec5.cluster.stats.local 1 batch.q_at_exec5.cluster.stats.local 0,1:0,2
>
> So the pe_hostfile still doesn't give an accurate representation of the binding allocation for use by OpenMPI. Question: is there a system file or command that I could use to check which processors are "occupied"?
>
> Chris
>
> --
> Dr Chris Jewell
> Department of Statistics
> University of Warwick
> Coventry
> CV4 7AL
> UK
> Tel: +44 (0)24 7615 0778
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users