Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] structure assumptions, duplication
From: Fawzi Mohamed (fawzi_at_[hidden])
Date: 2009-09-29 12:55:27


Hi Samuel,

On 29-set-09, at 18:14, Samuel Thibault wrote:

> Fawzi Mohamed, le Tue 29 Sep 2009 17:39:17 +0200, a écrit :
>> so that in the future one could avoid storing it at least in the
>> deepest levels where it is easy and relatively cheap to generate (and
>> where one would have the largest savings).
>
> Even the deepest levels would have a L1 cache level on top of maybe
> just
> at most 4 threads. Here we only save the "children" pointers, which
> is
> not so many, compared to the siblings & cousins pointers, I'm not sure
> it is really worth the pain of defining a long series of functions.

ok those were two separate things, I was thinking

cpuset -> cpuset_ptr (or just a flag that says if the structure has
it, and thus two structures, a long one with it and a short one
without, differing only in the tail if you really want to be hacky).
Then cpuset is generated on the fly for the deepest level (like less
than 4-8 proc -> lots of memory savings on large machines).
(cost 1 function, and copying or building the cpuset)

sibling/cousin -> only cousins (you can make them loop first on
siblings, then to the others if it really is a partition)
children -> only one representation (arity/childrens or first/last)
(cost many functions)

the main point is that these changes/optimizations can be done even
later without breaking anything if you use functions.

>> I would say that for most operations (cpuset, next_sibling,...) using
>> functions that get a hwloc_obj_t (and if needed also a topology) and
>> return what requested is the way to go.
>
> That means a long series of functions, I'm not sure it's really
> clearer
> for the user. obj->father looks to me easier to read than
> hwloc_obj_father(obj), particularly in complex expressions.

ok I can see that, so I guess you will have to evaluate if the
abstraction cost is worth the potential savings, maybe for cpuset it
is; for sibling,... you might be right that it isn't, for father it
sure isn't.

>> I suppose that most of these operations are not performance critical.
>
> I wouldn't suppose this actually. Detection time is probably not
> performance critical, but it could be useful to make browsing the
> topology very efficient.
>
>> ok, I was thinking that maybe you did/would like to provide in the
>> future something akin to what opensolaris does with locality groups
>> http://opensolaris.org/os/community/performance/mpo_overview.pdf
>
> Yes, we intend to provide something similar.
>
>> In fact what I "need" (or at least I think I need ;) is just the next
>> neighbors, basically I go up the hierarchy, and look which new
>> neighbors I have, so some hierarchy like the lgroups is close to what
>> I need, and simpler to handle than the full graph.
>
> That's what future heuristics would build for you, yes.

tha's great, I am really looking forward to it.

and sorry if I seem to be criticizing a lot, as I am mainly a user,
not a developer of hwloc, but I hope it is constructive, and maybe
helps making hwloc better...

ciao
Fawzi