Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] structure assumptions, duplication
From: Fawzi Mohamed (fawzi_at_[hidden])
Date: 2009-09-29 11:39:17


Thanks for the quick answers!

On 29-set-09, at 16:59, Samuel Thibault wrote:

> Fawzi Mohamed, le Tue 29 Sep 2009 16:39:47 +0200, a écrit :
>> Maybe I worry too much, but with machines with 1'000 of processor
>> coming, and maybe wanting local restricted copies to know the
>> topology
>> of the whole machine (to communicate with others) I worry also about
>> few pointers here an there.
>
> Actually the size of all the pointers is not so much compared to the
> size of the cpuset (by default 1024/8 = 128 bytes, worth 16 pointers).

yes and so the question if that should not be returned by a function
that initializes its argument (or returned by value), so that in the
future one could avoid storing it at least in the deepest levels where
it is easy and relatively cheap to generate (and where one would have
the largest savings).
I would say that for most operations (cpuset, next_sibling,...) using
functions that get a hwloc_obj_t (and if needed also a topology) and
return what requested is the way to go.
Basically OOP in C, so that the actual implementation is hidden, and
if you change the implementation the user has just to recompile.
If the function is simple and gets inlined the speed hit should be
basically zero, and anyway I suppose that most of these operations are
not performance critical.
This way you will be more free in the future to make aggressive changes.

>> 2) assumption on the structure
>>
>> Also a ring like topology cannot be cleanly represented with a
>> partition if one wants to have objects for groups with uniform
>> latency.
>
> Our plan was to not only provide a hierarcical view but also the
> precise
> graph. For instance, that means that for an Altix machine with a
> 2D-mesh of 3*4 NUMA nodes, the hierarchy would be system containing
> 12 nodes, themselves containing a socket etc. And the fact that the
> 12 nodes are organized as a 2D-mesh would be expressed by a graph
> structure, independently of the hierarchy.

ok I see

[... (thanks for the answers) ...]

> The idea is that some programmers will only want to cope with a
> hierarchical machine, while others will want to fine-tune according to
> the precise topology (much harder, not polynomial at least). And
> thus we
> should provide both. For the first kind of programmers, to tackle the
> 3*4 2D-mesh case above, we could provide a flag "make it
> hierarchical",
> which would heuristically group nodes recursively according to
> locality.
> For now we only do such grouping from the NUMA distances when there
> are
> clear NUMA subgroups.

ok, I was thinking that maybe you did/would like to provide in the
future something akin to what opensolaris does with locality groups
http://opensolaris.org/os/community/performance/mpo_overview.pdf

In fact what I "need" (or at least I think I need ;) is just the next
neighbors, basically I go up the hierarchy, and look which new
neighbors I have, so some hierarchy like the lgroups is close to what
I need, and simpler to handle than the full graph.

>> So I wanted to know how you cope with those things, and also if
>> something will probably change in the future, as some assumptions
>> will
>> inevitably creep in my code... and I would prefer to make the good
>> ones :)
>
> We should probably (agree on and) state what we want to always provide
> indeed.

Yes that would be great

ciao
Fawzi