Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] How to combine hwloc-bind and mpirun
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2011-11-10 07:57:41


Le 10/11/2011 13:13, Rafael R. Pappalardo a écrit :
> I am trying to send a MPI job to selected cores on a 64 cores machine. With
> taskset I use:
>
> mpirun -np 8 taskset -c 1,3,5,7,9,11,13,15 program
>
> but if I substitute taskset by hwloc-bind doing
>
> mpirun -np 8 hwloc-bind core:1 core:3 core:5 core:7 core:9 core:11 core:13
> core:15 program
>
> it does not work.

What do you mean by "does not work"? Failure? No binding? Wrong binding?

Note that taskset numbers are very likely different from hwloc-bind core
numbers. If you want to bind on 8 cores on the second socket, it may be
    mpirun -np 8 hwloc-bind core:8-15 program

> "Each hwloc-bind command in the mpirun above doesn't know that there
> are other hwloc-bind instances on the same machine. All of them bind
> their process to all cores in the first socket. "

This sentence also applies to taskset.

> Is there something wrong if I do:
>
> hwloc-bind core:1 core:3 core:5 core:7 core:9 core:11 core:13 core:15 mpirun -
> np 8 program

If you don't run the mpirun command on the machine where the final MPI
processes run, it won't work at all.

Otherwise, I would say that it depends on the implementation of mpirun.
And even if it binds the final MPI processes, it won't be better than above.

If you want to bind each individual process on a single and independent
core, you can:
* use a mpirun that can do that
* use a more complex mpiexec line if your MPI implementation supports
it, for instance by bind each process individually:
mpiexec -np 1 hwloc-bind core:8 program : -np 1 hwloc-bind core:9
program : -np 1 hwloc-bind core:10 program : -np 1 hwloc-bind core:11
program : -np 1 hwloc-bind core:12 program : -np 1 hwloc-bind core:13
program : -np 1 hwloc-bind core:14 program : -np 1 hwloc-bind core:15
program

Brice