A quick look at the code seems to confirm my feeling. get/set_module()
callbacks manipulate arrays of logical indexes, and they do not convert
them back to physical indexes before binding.
Here's a quick patch that may help. Only compile tested...
Le 11/04/2012 09:49, Brice Goglin a écrit :
> Le 11/04/2012 09:06, tmishima_at_[hidden] a écrit :
>> Hi, Brice.
>> I installed the latest hwloc-1.4.1.
>> Here is the output of lstopo -p.
>> [root_at_node03 bin]# ./lstopo -p
>> Machine (126GB)
>> Socket P#0 (32GB)
>> NUMANode P#0 (16GB) + L3 (5118KB)
>> L2 (512KB) + L1 (64KB) + Core P#0 + PU P#0
>> L2 (512KB) + L1 (64KB) + Core P#1 + PU P#4
>> L2 (512KB) + L1 (64KB) + Core P#2 + PU P#8
>> L2 (512KB) + L1 (64KB) + Core P#3 + PU P#12
> Ok then the cpuset of this numanode is 1111.
>>> [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
>>> [[55518,1],0] to cpus 1111
> So openmpi 1.5.4 is correct.
>>> [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
>>> [[40566,1],0] to cpus 000f
> And openmpi 1.5.5 is indeed wrong.
> Random guess: 000f is the bitmask made of hwloc *logical* indexes. hwloc
> cpusets (used for binding) are internally made of hwloc *physical*
> indexes (1111 here).
> Jeff, Ralph:
> How does OMPI 1.5.5 build hwloc cpusets for binding? Are you doing
> bitmap operations on hwloc object cpusets?
> If yes, I don't know what's going wrong here.
> If no, are you building hwloc cpusets manually by setting individual
> bits from object indexes? If yes, you must use *physical* indexes to do so.
> users mailing list