Good day,
I'm studying the impact of MPI process binding on communication costs in my project, and would like to use hwloc-bind to achieve fine-grained mapping control. I install hwloc 1.1.1 on a 2-socket 4-core machine (with 2 dual-core dies in each socket), and run hwloc-ps to verify the binding:
$ mpirun -Vmpirun (Open MPI) 1.5.1$ mpirun -np 4 hwloc-bind socket:0.core:0-3 ./test
hwloc-ps shows the following output:
$ hwloc-ps -p1497 Socket:0 ./test1498 Socket:0 ./test1499 Socket:0 ./test1500 Socket:0 ./test$ hwloc-ps -l1497 Socket:0 ./test1498 Socket:0 ./test1499 Socket:0 ./test1500 Socket:0 ./test$ hwloc-ps -c1497 0x00000055 ./test1498 0x00000055 ./test1499 0x00000055 ./test1500 0x00000055 ./test
Questions:1. Does hwloc-bind map the processes *sequentially* on *successive* cores of the socket?
2. How could hwloc-ps help verify this binding, i.e.,
process 1497 (rank 0) on socket.0:core.0process 1498 (rank 1) on socket.0:core.1process 1499 (rank 2) on socket.0:core.2process 1500 (rank 3) on socket.0:core.3
Equivalently, does the binding of `socket:0.core:0-1 socket:1.core:0-1' with hwloc-ps showing
$ hwloc-ps -l1315 L2Cache:0 L2Cache:2 ./test1316 L2Cache:0 L2Cache:2 ./test1317 L2Cache:0 L2Cache:2 ./test1318 L2Cache:0 L2Cache:2 ./test
indicate the the following? I.e.,
process 1315 (rank 0) on socket.0:core.0process 1316 (rank 1) on socket.0:core.1process 1317 (rank 2) on socket.1:core.0process 1318 (rank 3) on socket.1:core.1