Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] hwloc-distrib --among
From: Samuel Thibault (samuel.thibault_at_[hidden])
Date: 2010-11-16 16:18:26


Jirka Hladky, le Tue 16 Nov 2010 21:37:01 +0100, a écrit :
> There was some discussion about hwloc-distrib --among
>
> If I understand it correctly, --among accepts one of
> {pu,core,socket,node,machine}

I actually didn't know about the --among option. It seems a bit
difficult to comprehend without reading the source code... Actually I'm
not even sure about the cases where it is useful. In my understanding
of the english word, "among" designates the set of objects to be picked
from, i.e. an horizontal selection in the case of the hwloc tree, while
in hwloc-distrib it's currently a vertical selection: where to start
distributing from. That's probably why you do not understand the
results:

> $ hwloc-calc --po --proclist $(hwloc-distrib --single --among core 4)
> 0,2,4,6
>
> Among Socket:1 ??

Yes, because here "--among core" requests to distribute 4 elements
between cores, thus selecting the first 4 cores hwloc finds.

> $ hwloc-calc --po --proclist $(hwloc-distrib --single --among pu 4)
> 0,8,2,10
>
> Among Core:0 and Core:1 ??

Same here, distributing 4 elements between the PUs, thus selecting the
first 4 PUs.

I'd say that --among should indeed be the horizontal portion of the
machine to distribute on, and the existing --among option should be
renamed into --from, and a --to option could be added to stop the
hierarchical distribution to a given object type.

> I have tried to use various --among and --ignore options to distribute 8
> parallel jobs on a box so that each job is running on one socket.

That would then be

hwloc-distrib --to socket

to make jobs distributed down from the top machine object to sockets,
but no further. --among (as I described above) would be useful if you
wanted to restrict the distribution to some part of the machine.

> I was not able to achieve this.

Yes, currently the distribution function will always continue
distributing recursively as long as there are more elements to
distribute than architecture elements in the machine.

Also note that currently the hwloc_distribute() function doesn't take
e.g. the number of PUs into account when splitting elements over the
hierarchy. It was more a demonstration example than something to be used
as is. We can however extend it, we just need to know what's desired.

Samuel