Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] wrong core binding by openmpi-1.5.5
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-04-11 05:19:15


Interesting. Jeff and I had discussed that very problem not that long ago, and I could swear he fixed it - but I don't see the CMR for that code. He's on vacation this week, so I'll wait for his return to look at it.

Thanks!
Ralph

On Apr 11, 2012, at 2:36 AM, Brice Goglin wrote:

> A quick look at the code seems to confirm my feeling. get/set_module()
> callbacks manipulate arrays of logical indexes, and they do not convert
> them back to physical indexes before binding.
>
> Here's a quick patch that may help. Only compile tested...
>
> Brice
>
>
>
> Le 11/04/2012 09:49, Brice Goglin a écrit :
>> Le 11/04/2012 09:06, tmishima_at_[hidden] a écrit :
>>> Hi, Brice.
>>>
>>> I installed the latest hwloc-1.4.1.
>>> Here is the output of lstopo -p.
>>>
>>> [root_at_node03 bin]# ./lstopo -p
>>> Machine (126GB)
>>> Socket P#0 (32GB)
>>> NUMANode P#0 (16GB) + L3 (5118KB)
>>> L2 (512KB) + L1 (64KB) + Core P#0 + PU P#0
>>> L2 (512KB) + L1 (64KB) + Core P#1 + PU P#4
>>> L2 (512KB) + L1 (64KB) + Core P#2 + PU P#8
>>> L2 (512KB) + L1 (64KB) + Core P#3 + PU P#12
>> Ok then the cpuset of this numanode is 1111.
>>
>>>> [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
>>>> [[55518,1],0] to cpus 1111
>> So openmpi 1.5.4 is correct.
>>
>>>> [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
>>>> [[40566,1],0] to cpus 000f
>> And openmpi 1.5.5 is indeed wrong.
>>
>> Random guess: 000f is the bitmask made of hwloc *logical* indexes. hwloc
>> cpusets (used for binding) are internally made of hwloc *physical*
>> indexes (1111 here).
>>
>> Jeff, Ralph:
>> How does OMPI 1.5.5 build hwloc cpusets for binding? Are you doing
>> bitmap operations on hwloc object cpusets?
>> If yes, I don't know what's going wrong here.
>> If no, are you building hwloc cpusets manually by setting individual
>> bits from object indexes? If yes, you must use *physical* indexes to do so.
>>
>> Brice
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> <try.patch>_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users