Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] hwloc-ps output - how to verify process binding on the core level?
From: Siew Yin Chan (sychan131_at_[hidden])
Date: 2011-02-14 09:35:07

1. I tried Open MPI 1.5.1 before turning to hwloc-bind. Yep. Open MPI 1.5.1 does provide the --bycore and --bind-to-core option, but this option seems to bind processes to cores on my machine according to the *physical* indexes:

[user_at_compute-0-8 ~]$ lstopo --physical
Machine (16GB)
  Socket P#0
    L2 (4096KB)
      L1 (32KB) + Core P#0 + PU P#0
      L1 (32KB) + Core P#1 + PU P#2
    L2 (4096KB)
      L1 (32KB) + Core P#2 + PU P#4
      L1 (32KB) + Core P#3 + PU P#6
  Socket P#1
    L2 (4096KB)
      L1 (32KB) + Core P#0 + PU P#1
      L1 (32KB) + Core P#1 + PU P#3
    L2 (4096KB)
      L1 (32KB) + Core P#2 + PU P#5
      L1 (32KB) + Core P#3 + PU P#7

Rank 0 --> PU#0 = socket.0:core.0
Rank 1 --> PU#1 = socket.1:core.0
Rank 2 --> PU#2 = socket.0:core.2
Rank 3 --> PU#3 = socket.1:core.2
Rank 4 --> PU#4 = socket.0:core.1
Rank 5 --> PU#5 = socket.1:core.1
Rank 6 --> PU#6 = socket.0:core.3
Rank 7 --> PU#7 = socket.1:core.3

What I intend to achieve (and verify) is to bind processes following the *logical* indexes, i.e.,

Rank 0 --> PU#0 = socket.0:core.0
Rank 1 --> PU#4 = socket.0:core.1
Rank 2 --> PU#2 = socket.0:core.2
Rank 3 --> PU#6 = socket.0:core.3
Rank 4 --> PU#1 = socket.1:core.0
Rank 5 --> PU#5 = socket.1:core.1
Rank 6 --> PU#3 = socket.1:core.2
Rank 7 --> PU#7 = socket.1:core.3

The above specific configuration can be achieved using the -rf option with a rank file in OMPI, but it seems to me that the rank file doesn't work in the multiple instruction multiple data (MIMD) environment. The complication brought me to trying hwloc-bind.

FYI, my testing environment and application imposes these requirements for optimum performance:

i. Different binaries optimized for heterogeneous machines. This necessitates MIMD, and can be done in OMPI using the -app option (providing an application context file).
ii. The application is communication-sensitive. Thus, fine-grained process mapping on *machines* and on *cores* is required to minimize inter-machine and inter-socket communication costs occurring on the network and on the system bus. Specifically, processes should be mapped onto successive cores of one socket before the next socket is considered, i.e., socket.0:core0-3, then socket.1:core0-3. In this case, the communication among neighboring rank 0-3 will be confined to socket 0 without going through the system bus. Same for rank 4-7 on socket 1. As such, the order of the cores should follow the *logical* indexes.

Initially, I tried combining the features of rankfile and appfile, e.g.,

$ cat rankfile8np4
rank 0=compute-0-8 slot=0:0
rank 1=compute-0-8 slot=0:1
rank 2=compute-0-8 slot=0:2
rank 3=compute-0-8 slot=0:3
$ cat rankfile9np4
rank 0=compute-0-9 slot=0:0
rank 1=compute-0-9 slot=0:1
rank 2=compute-0-9 slot=0:2
rank 3=compute-0-9 slot=0:3
$ cat my_appfile_rankfile
--host compute-0-8 -rf rankfile8np4 -np 4 ./test1
--host compute-0-9 -rf rankfile9np4 -np 4 ./test2
$ mpirun -app my_appfile_rankfile

but found out that only the rankfile stated on the first line took effect; the second was ignored completely. After some time of googling and trial and error, I decided to try an external binder, and this direction led me to hwloc-bind.

Maybe I should bring the issue of rankfile + appfile to the OMPI mailing list.

2. I thought of invoking a script too, but am not sure how to start. Thanks for your info. I shall come to back to you if I need further help.


--- On Mon, 2/14/11, Jeff Squyres <jsquyres_at_[hidden]> wrote:

From: Jeff Squyres <jsquyres_at_[hidden]>
Subject: Re: [hwloc-users] hwloc-ps output - how to verify process binding on the core level?
To: "Hardware locality user list" <hwloc-users_at_[hidden]>
Date: Monday, February 14, 2011, 7:26 AM