Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] wrong core binding by openmpi-1.5.5
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-04-11 03:49:07


Le 11/04/2012 09:06, tmishima_at_[hidden] a écrit :
> Hi, Brice.
>
> I installed the latest hwloc-1.4.1.
> Here is the output of lstopo -p.
>
> [root_at_node03 bin]# ./lstopo -p
> Machine (126GB)
> Socket P#0 (32GB)
> NUMANode P#0 (16GB) + L3 (5118KB)
> L2 (512KB) + L1 (64KB) + Core P#0 + PU P#0
> L2 (512KB) + L1 (64KB) + Core P#1 + PU P#4
> L2 (512KB) + L1 (64KB) + Core P#2 + PU P#8
> L2 (512KB) + L1 (64KB) + Core P#3 + PU P#12

Ok then the cpuset of this numanode is 1111.

>> [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
>> [[55518,1],0] to cpus 1111

So openmpi 1.5.4 is correct.

>> [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
>> [[40566,1],0] to cpus 000f
And openmpi 1.5.5 is indeed wrong.

Random guess: 000f is the bitmask made of hwloc *logical* indexes. hwloc
cpusets (used for binding) are internally made of hwloc *physical*
indexes (1111 here).

Jeff, Ralph:
How does OMPI 1.5.5 build hwloc cpusets for binding? Are you doing
bitmap operations on hwloc object cpusets?
If yes, I don't know what's going wrong here.
If no, are you building hwloc cpusets manually by setting individual
bits from object indexes? If yes, you must use *physical* indexes to do so.

Brice