Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] distributing across cores with hwloc-distrib
From: Tim Creech (tcreech_at_[hidden])
Date: 2014-03-30 11:08:13


Hi Brice,
  First, my apologies if this email starts a new thread. For some reason I
never received your response through Mailman and can only see it through the
web archive interface. I'm constructing this reponse without things like
"In-Reply-To".

Thank you for your very helpful response. I'll use your explanation of the
algorithm and try to understand the implementation. I was indeed expecting
expecting hwloc-distrib to help me to bind multithreaded processes, although I
certainly can understand that this is considered a corner case. Could you
please consider fixing this?

Thanks,
Tim

Brice Goglin wrote:
> Hello,
>
> This is the main corner case of hwloc-distrib. It can return objects
> only, not groups of objects. The distrib algorithms is:
> 1) start at the root, where there are M children, and you have to
> distribute N processes
> 2) if there are no children, or if N is 1, return the entire object
> 3) split N into Ni (N = sum of Ni) into M pieces based on each children
> weight (the number of PUs under each)
> If N>=M, all Ni can be > 0, all children will get some process
> if N<M, you can't split N into M integer pieces, some Ni will be 0,
> these objects won't get any process
> 4) go back to (2) recurse in each children object with Ni instead of N
>
> Your case is step 3 with N=2 and M=4. It basically means that we
> distribute across cores without "assembling group of cores if needed".
>
> In your case, when you bind to 2 cores of 4 PUs each, your task only
> uses one PU in the end, 1 core and 3 PU are ignored as well. They *may*
> be used, but the operating system scheduler is free to ignore them. So
> binding to 2 cores or binding to 1 core or binding to 1 PU is almost
> equivalent. At least the latter is included in the former. And most
> people pass --single to get a single PU anyway.
>
> The case where it's not equivalent is when you bind multithreaded
> processes. If you have 8 threads, it's better to use 2 cores than 1
> single one. If this case matters to you, I will look into fixing this
> corner case.
>
> Brice
>
> Le 30/03/2014 07:56, Tim Creech a écrit :
> > Hello,
> > I would like to use hwloc_distrib for a project, but I'm having some
> > trouble understanding how it distributes. Specifically, it seems to
> > avoid distributing multiple processes across cores, and I'm not sure
> > why.
> >
> > As an example, consider the actual output of:
> >
> > $ hwloc-distrib -i "4 4" 2
> > 0x0000000f
> > 0x000000f0
> >
> > I'm expecting hwloc-distrib to tell me how to distribute 2 processes
> > across the 16 PUs (4 cores by 4 PUs), but the answer only involves 8
> > PUs, leaving the other 8 unused. If there were more cores on the
> > machine, then potentially the vast majority of them would be unused.
> >
> > In other words, I might expect the output to use all of the PUs across
> > cores, for example:
> >
> > $ hwloc-distrib -i "4 4" 2
> > 0x000000ff
> > 0x0000ff00
> >
> > Why does hwloc-distrib leave PUs unused? I'm using hwloc-1.9. Any help
> > in understanding where I'm going wrong is greatly appreciated!
> >
> > Thanks,
> > Tim
> >
> > _______________________________________________
> > hwloc-users mailing list
> > hwloc-users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users