Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Problem with mpiexec --cpus-per-proc in multiple nodes in OMPI 1.6.4
From: Gus Correa (gus_at_[hidden])
Date: 2013-03-29 10:53:18


Thank you, Ralph!
Gus Correa

On 03/29/2013 09:33 AM, Ralph Castain wrote:
> Just an update: I have this fixed in the OMPI trunk. It didn't make 1.7.0, but will be in 1.7.1 and beyond.
>
>
> On Mar 21, 2013, at 2:09 PM, Gus Correa<gus_at_[hidden]> wrote:
>
>> Thank you, Ralph.
>>
>> I will try to use a rankfile.
>>
>> In any case, the --cpus-per-proc option is a very useful feature:
>> for hybrid MPI+OpenMP programs, for these processors with one FPU
>> shared by two cores, etc.
>> If it gets fixed in a later release of OMPI that would be great.
>>
>> Thank you,
>> Gus Correa
>>
>>
>> On 03/21/2013 04:03 PM, Ralph Castain wrote:
>>> I've heard this from a couple of other sources -
>> it looks like there is a problem on the daemons when
>> they compute the location for -cpus-per-proc.
>> I'm not entirely sure why that would be as the code
>> is supposed to be common with mpirun, but there are
>> a few differences.
>>> I will take a look at it - I don't know of any workaround,
>> I'm afraid.
>>> On Mar 21, 2013, at 12:01 PM, Gus Correa<gus_at_[hidden]> wrote:
>>>
>>>> Dear Open MPI Pros
>>>>
>>>> I am having trouble using mpiexec with --cpus-per-proc
>>>> on multiple nodes in OMPI 1.6.4.
>>>>
>>>> I know there is an ongoing thread on similar runtime issues
>>>> of OMPI 1.7.
>>>> By no means I am trying to hijack T. Mishima's questions.
>>>> My question is genuine, though, and perhaps related to his.
>>>>
>>>> I am testing a new cluster remotely, with monster
>>>> dual socket 16-core AMD Bulldozer processors (32 cores per node).
>>>> I am using OMPI 1.6.4 built with Torque 4.2.1 support.
>>>>
>>>> I read that on these processors each pair of cores share an FPU.
>>>> Hence, I am trying to run *one MPI process* on each
>>>> *pair of successive cores*.
>>>> This trick seems to yield better performance
>>>> (at least for HPL/Linpack) than using all cores.
>>>> I.e., the goal is to use "each other core", or perhaps
>>>> to allow each process to wobble across two successive cores only,
>>>> hence granting exclusive use of one FPU per process.
>>>> [BTW, this is *not* an attempt to do hybrid MPI+OpenMP.
>>>> The code is HPL with MPI+BLAS/Lapack and NO OpenMP.]
>>>>
>>>> To achieve this, I am using the mpiexec --cpus-per-proc option.
>>>> It works on one node, which is great.
>>>> However, unless I made a silly syntax or arithmetic mistake,
>>>> it doesn't seem to work on more than one node.
>>>>
>>>> For instance, this works:
>>>>
>>>> #PBS -l nodes=1:ppn=32
>>>> ...
>>>> mpiexec -np 16 \
>>>> --cpus-per-proc 2 \
>>>> --bind-to-core \
>>>> --report-bindings \
>>>> --tag-output \
>>>>
>>>> I get a pretty nice process-to-cores distribution, with 16 processes, and each process bound to a couple of successive cores,
>>>> as expected:
>>>>
>>>> [1,7]<stderr>:[node33:04744] MCW rank 7 bound to socket 0[core 14-15]: [. . . . . . . . . . . . . . B B][. . . . . . . . . . . . . . . .]
>>>> [1,8]<stderr>:[node33:04744] MCW rank 8 bound to socket 1[core 0-1]: [. . . . . . . . . . . . . . . .][B B . . . . . . . . . . . . . .]
>>>> [1,9]<stderr>:[node33:04744] MCW rank 9 bound to socket 1[core 2-3]: [. . . . . . . . . . . . . . . .][. . B B . . . . . . . . . . . .]
>>>> [1,10]<stderr>:[node33:04744] MCW rank 10 bound to socket 1[core 4-5]: [. . . . . . . . . . . . . . . .][. . . . B B . . . . . . . . . .]
>>>> [1,11]<stderr>:[node33:04744] MCW rank 11 bound to socket 1[core 6-7]: [. . . . . . . . . . . . . . . .][. . . . . . B B . . . . . . . .]
>>>> [1,12]<stderr>:[node33:04744] MCW rank 12 bound to socket 1[core 8-9]: [. . . . . . . . . . . . . . . .][. . . . . . . . B B . . . . . .]
>>>> [1,13]<stderr>:[node33:04744] MCW rank 13 bound to socket 1[core 10-11]: [. . . . . . . . . . . . . . . .][. . . . . . . . . . B B . . . .]
>>>> [1,14]<stderr>:[node33:04744] MCW rank 14 bound to socket 1[core 12-13]: [. . . . . . . . . . . . . . . .][. . . . . . . . . . . . B B . .]
>>>> [1,15]<stderr>:[node33:04744] MCW rank 15 bound to socket 1[core 14-15]: [. . . . . . . . . . . . . . . .][. . . . . . . . . . . . . . B B]
>>>> [1,0]<stderr>:[node33:04744] MCW rank 0 bound to socket 0[core 0-1]: [B B . . . . . . . . . . . . . .][. . . . . . . . . . . . . . . .]
>>>> [1,1]<stderr>:[node33:04744] MCW rank 1 bound to socket 0[core 2-3]: [. . B B . . . . . . . . . . . .][. . . . . . . . . . . . . . . .]
>>>> [1,2]<stderr>:[node33:04744] MCW rank 2 bound to socket 0[core 4-5]: [. . . . B B . . . . . . . . . .][. . . . . . . . . . . . . . . .]
>>>> [1,3]<stderr>:[node33:04744] MCW rank 3 bound to socket 0[core 6-7]: [. . . . . . B B . . . . . . . .][. . . . . . . . . . . . . . . .]
>>>> [1,4]<stderr>:[node33:04744] MCW rank 4 bound to socket 0[core 8-9]: [. . . . . . . . B B . . . . . .][. . . . . . . . . . . . . . . .]
>>>> [1,5]<stderr>:[node33:04744] MCW rank 5 bound to socket 0[core 10-11]: [. . . . . . . . . . B B . . . .][. . . . . . . . . . . . . . . .]
>>>> [1,6]<stderr>:[node33:04744] MCW rank 6 bound to socket 0[core 12-13]: [. . . . . . . . . . . . B B . .][. . . . . . . . . . . . . . . .]
>>>>
>>>>
>>>> ***************
>>>>
>>>> However, when I try to use eight nodes,
>>>> the job fails and I get the error message below (repeatedly from
>>>> several nodes):
>>>>
>>>> #PBS -l nodes=8:ppn=32
>>>> ...
>>>> mpiexec -np 128 \
>>>> --cpus-per-proc 2 \
>>>> --bind-to-core \
>>>> --report-bindings \
>>>> --tag-output \
>>>>
>>>>
>>>> Error message:
>>>>
>>>> --------------------------------------------------------------------------
>>>> An invalid physical processor ID was returned when attempting to bind
>>>> an MPI process to a unique processor on node:
>>>>
>>>> Node: node18
>>>>
>>>> This usually means that you requested binding to more processors than
>>>> exist (e.g., trying to bind N MPI processes to M processors, where N>
>>>> M), or that the node has an unexpectedly different topology.
>>>>
>>>> Double check that you have enough unique processors for all the
>>>> MPI processes that you are launching on this host, and that all nodes
>>>> have identical topologies.
>>>>
>>>> You job will now abort.
>>>> --------------------------------------------------------------------------
>>>>
>>>> Oddly enough, the binding map *is* shown on STDERR,
>>>> and it sounds *correct*, pretty much the same binding map above
>>>> that I get for a single node.
>>>>
>>>> *****************
>>>>
>>>> Finally, replacing "--cpus-per-core 2" by "--npernode 16"
>>>> works to some extent, but doesn't reach my goal.
>>>> I.e., the job doesn't fail, and each node gets 16 MPI
>>>> processes indeed.
>>>> However, it doesn't bind the processes the way I want.
>>>> Regardless of whether I continue to use "--bind-to-core"
>>>> or replace it by "--bind-to-socket"
>>>> all 16 processes on each node always bind to socket 0,
>>>> and nothing goes to socket 1.
>>>>
>>>> ************
>>>>
>>>> Is there any simple workaround to this
>>>> (other than using a --rankfile),
>>>> to make --cpus-per-proc work with multiple nodes,
>>>> using "each other core"?
>>>>
>>>> [Only if it is simple workaround. I must finish this
>>>> remote test soon. Otherwise I can revisit this issue later.]
>>>>
>>>> Thank you,
>>>> Gus Correa
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users