Le 13/02/2011 04:54, Siew Yin Chan a écrit :
Good day,

I'm studying the impact of MPI process binding on communication costs in my project, and would like to use hwloc-bind to achieve fine-grained mapping control. I install hwloc 1.1.1 on a 2-socket 4-core machine (with 2 dual-core dies in each socket), and run hwloc-ps to verify the binding:

$ mpirun -V
mpirun (Open MPI) 1.5.1
$ mpirun -np 4 hwloc-bind socket:0.core:0-3 ./test

hwloc-ps shows the following output:

$ hwloc-ps -p
1497    Socket:0                ./test
1498    Socket:0                ./test
1499    Socket:0                ./test
1500    Socket:0                ./test
$ hwloc-ps -l
1497    Socket:0                ./test
1498    Socket:0                ./test
1499    Socket:0                ./test
1500    Socket:0                ./test
$ hwloc-ps -c
1497    0x00000055              ./test
1498    0x00000055              ./test
1499    0x00000055              ./test
1500    0x00000055              ./test

1. Does hwloc-bind map the processes *sequentially* on *successive* cores of the socket?


No. Each hwloc-bind command in the mpirun above doesn't know that there are other hwloc-bind instances on the same machine. All of them bind their process to all cores in the first socket.

2. How could hwloc-ps help verify this binding, i.e.,

process 1497 (rank 0) on socket.0:core.0
process 1498 (rank 1) on socket.0:core.1
process 1499 (rank 2) on socket.0:core.2
process 1500 (rank 3) on socket.0:core.3

(let's assume your mpirun command did what you want)

You would get something like this from hwloc-ps:

1497    Core:0    ./test
1498    Core:1    ./test
1499    Core:2    ./test
1500    Core:0    ./test

These core numbers are the logical core number among the entire machine. hwloc-ps can't easily show hierarchical location such as socket.core since there are many possible combinations, especially because of caches.

Actually, you might get L1Cache instead of Core above since hwloc-ps reports the first object that exactly matches the process binding (and L1 is above but equal to Core in your machine).

If you want to get other output, I suggest you use hwloc-calc to convert the hwloc-ps output.

Equivalently, does the binding of `socket:0.core:0-1 socket:1.core:0-1' with hwloc-ps showing

$ hwloc-ps -l
1315    L2Cache:0 L2Cache:2             ./test
1316    L2Cache:0 L2Cache:2             ./test
1317    L2Cache:0 L2Cache:2             ./test
1318    L2Cache:0 L2Cache:2             ./test

indicate the the following? I.e.,

process 1315 (rank 0) on socket.0:core.0
process 1316 (rank 1) on socket.0:core.1
process 1317 (rank 2) on socket.1:core.0
process 1318 (rank 3) on socket.1:core.1

No. Again, all processes are bound to 4 different cores, so hwloc-ps shows the largest objects containing those cores.

In the end, you want a MPI launcher that takes care of the binding instead of having to manually bind on the command line. It should be the case with most MPI launchers nowadays. Once this is ok, hwloc-ps will report this exact core where you bound. And you might need to play with hwloc-calc to rewrite the hwloc-ps output as you want.

I am thinking of adding an option to hwloc-calc to help rewriting a random string into socket:X.core:Y or something like that.