Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] distributing across cores with hwloc-distrib
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2014-03-30 11:32:38

Don't worry, binding multithreaded processes is not a corner case. I was
rather talking about the general "distributing less processes than there
are object and returning cpusets as large as possible".

The attached patch should help. Please let me know.


Le 30/03/2014 17:08, Tim Creech a écrit :
> Hi Brice,
> First, my apologies if this email starts a new thread. For some reason I
> never received your response through Mailman and can only see it through the
> web archive interface. I'm constructing this reponse without things like
> "In-Reply-To".
> Thank you for your very helpful response. I'll use your explanation of the
> algorithm and try to understand the implementation. I was indeed expecting
> expecting hwloc-distrib to help me to bind multithreaded processes, although I
> certainly can understand that this is considered a corner case. Could you
> please consider fixing this?
> Thanks,
> Tim
> Brice Goglin wrote:
>> Hello,
>> This is the main corner case of hwloc-distrib. It can return objects
>> only, not groups of objects. The distrib algorithms is:
>> 1) start at the root, where there are M children, and you have to
>> distribute N processes
>> 2) if there are no children, or if N is 1, return the entire object
>> 3) split N into Ni (N = sum of Ni) into M pieces based on each children
>> weight (the number of PUs under each)
>> If N>=M, all Ni can be > 0, all children will get some process
>> if N<M, you can't split N into M integer pieces, some Ni will be 0,
>> these objects won't get any process
>> 4) go back to (2) recurse in each children object with Ni instead of N
>> Your case is step 3 with N=2 and M=4. It basically means that we
>> distribute across cores without "assembling group of cores if needed".
>> In your case, when you bind to 2 cores of 4 PUs each, your task only
>> uses one PU in the end, 1 core and 3 PU are ignored as well. They *may*
>> be used, but the operating system scheduler is free to ignore them. So
>> binding to 2 cores or binding to 1 core or binding to 1 PU is almost
>> equivalent. At least the latter is included in the former. And most
>> people pass --single to get a single PU anyway.
>> The case where it's not equivalent is when you bind multithreaded
>> processes. If you have 8 threads, it's better to use 2 cores than 1
>> single one. If this case matters to you, I will look into fixing this
>> corner case.
>> Brice
>> Le 30/03/2014 07:56, Tim Creech a écrit :
>>> Hello,
>>> I would like to use hwloc_distrib for a project, but I'm having some
>>> trouble understanding how it distributes. Specifically, it seems to
>>> avoid distributing multiple processes across cores, and I'm not sure
>>> why.
>>> As an example, consider the actual output of:
>>> $ hwloc-distrib -i "4 4" 2
>>> 0x0000000f
>>> 0x000000f0
>>> I'm expecting hwloc-distrib to tell me how to distribute 2 processes
>>> across the 16 PUs (4 cores by 4 PUs), but the answer only involves 8
>>> PUs, leaving the other 8 unused. If there were more cores on the
>>> machine, then potentially the vast majority of them would be unused.
>>> In other words, I might expect the output to use all of the PUs across
>>> cores, for example:
>>> $ hwloc-distrib -i "4 4" 2
>>> 0x000000ff
>>> 0x0000ff00
>>> Why does hwloc-distrib leave PUs unused? I'm using hwloc-1.9. Any help
>>> in understanding where I'm going wrong is greatly appreciated!
>>> Thanks,
>>> Tim
>>> _______________________________________________
>>> hwloc-users mailing list
>>> hwloc-users_at_[hidden]
> _______________________________________________
> hwloc-users mailing list
> hwloc-users_at_[hidden]