Ouch - finally figured out what happened. Jeff and I did indeed address this problem a few weeks ago. There were some changes required in a couple of places to make it all work, so we did the work in a Mercurial branch Jeff set up.

Unfortunately, I think he got distracted by the MPI Forum shortly thereafter, and then got engulfed by other things. The work appears complete, but I can't find a record of it actually being committed to the 1.5 branch. Could be he intended it for 1.6.

I'll have to bug him when he gets back next week and see what happened, and his plans. Sorry for the mixup.
Ralph

On Apr 11, 2012, at 3:15 AM, Brice Goglin wrote:

Here's a better patch. Still only compile tested :)
Brice


Le 11/04/2012 10:36, Brice Goglin a écrit :
A quick look at the code seems to confirm my feeling. get/set_module()
callbacks manipulate arrays of logical indexes, and they do not convert
them back to physical indexes before binding.

Here's a quick patch that may help. Only compile tested...

Brice



Le 11/04/2012 09:49, Brice Goglin a écrit :
Le 11/04/2012 09:06, tmishima@jcity.maeda.co.jp a écrit :
Hi, Brice.

I installed the latest hwloc-1.4.1.
Here is the output of lstopo -p.

[root@node03 bin]# ./lstopo -p
Machine (126GB)
  Socket P#0 (32GB)
    NUMANode P#0 (16GB) + L3 (5118KB)
      L2 (512KB) + L1 (64KB) + Core P#0 + PU P#0
      L2 (512KB) + L1 (64KB) + Core P#1 + PU P#4
      L2 (512KB) + L1 (64KB) + Core P#2 + PU P#8
      L2 (512KB) + L1 (64KB) + Core P#3 + PU P#12
Ok then the cpuset of this numanode is 1111.

[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],0] to cpus 1111
So openmpi 1.5.4 is correct.

[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],0] to cpus 000f
And openmpi 1.5.5 is indeed wrong.

Random guess: 000f is the bitmask made of hwloc *logical* indexes. hwloc
cpusets (used for binding) are internally made of hwloc *physical*
indexes (1111 here).

Jeff, Ralph:
How does OMPI 1.5.5 build hwloc cpusets for binding? Are you doing
bitmap operations on hwloc object cpusets?
If yes, I don't know what's going wrong here.
If no, are you building hwloc cpusets manually by setting individual
bits from object indexes? If yes, you must use *physical* indexes to do so.

Brice

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

      
_______________________________________________ users mailing list users@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users

<try2.patch>_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users