Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] distributing across cores with hwloc-distrib
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2014-03-30 03:28:58


Hello,

This is the main corner case of hwloc-distrib. It can return objects
only, not groups of objects. The distrib algorithms is:
1) start at the root, where there are M children, and you have to
distribute N processes
2) if there are no children, or if N is 1, return the entire object
3) split N into Ni (N = sum of Ni) into M pieces based on each children
weight (the number of PUs under each)
   If N>=M, all Ni can be > 0, all children will get some process
   if N<M, you can't split N into M integer pieces, some Ni will be 0,
these objects won't get any process
4) go back to (2) recurse in each children object with Ni instead of N

Your case is step 3 with N=2 and M=4. It basically means that we
distribute across cores without "assembling group of cores if needed".

In your case, when you bind to 2 cores of 4 PUs each, your task only
uses one PU in the end, 1 core and 3 PU are ignored as well. They *may*
be used, but the operating system scheduler is free to ignore them. So
binding to 2 cores or binding to 1 core or binding to 1 PU is almost
equivalent. At least the latter is included in the former. And most
people pass --single to get a single PU anyway.

The case where it's not equivalent is when you bind multithreaded
processes. If you have 8 threads, it's better to use 2 cores than 1
single one. If this case matters to you, I will look into fixing this
corner case.

Brice

Le 30/03/2014 07:56, Tim Creech a écrit :
> Hello,
> I would like to use hwloc_distrib for a project, but I'm having some
> trouble understanding how it distributes. Specifically, it seems to
> avoid distributing multiple processes across cores, and I'm not sure
> why.
>
> As an example, consider the actual output of:
>
> $ hwloc-distrib -i "4 4" 2
> 0x0000000f
> 0x000000f0
>
> I'm expecting hwloc-distrib to tell me how to distribute 2 processes
> across the 16 PUs (4 cores by 4 PUs), but the answer only involves 8
> PUs, leaving the other 8 unused. If there were more cores on the
> machine, then potentially the vast majority of them would be unused.
>
> In other words, I might expect the output to use all of the PUs across
> cores, for example:
>
> $ hwloc-distrib -i "4 4" 2
> 0x000000ff
> 0x0000ff00
>
> Why does hwloc-distrib leave PUs unused? I'm using hwloc-1.9. Any help
> in understanding where I'm going wrong is greatly appreciated!
>
> Thanks,
> Tim
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users