Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] Bug report: topology strange on SGI UltraViolet
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2010-07-28 15:14:07


Le 28/07/2010 20:59, Bernd Kallies a écrit :
> So it seems to me that you basically get a distance matrix of PU objects
>

NUMA node objects actually. That's what Linux and Solaris report.

> from the system (the machine vendor), and probably you do agglomerative
> average linkage cluster analysis on it to determine the number and
> hierarchy of HWLOC_OBJ_GROUP objects (beyond what can be named by some
> hardware building block like core or cache etc). Is this right?
> I'm wondering if this is the right approach. Did you try other distance
> functions (e.g. single linkage)?
>

In 1.0.x we look at "complete graphs with minimal distances" and then at
"transitive graphs with minimal distances". One problem with this old
code is:
if finds that Group0#0 and #1 have minimal distance between them (22)
but it ignores the fact that Group0#2 is also at the same distance from
#1. And so on.

This code actually gives completely invalid groups on some strange HP
machines. In trunk, the code was reworked/cleaned to only look for
transitive graphs. Given your distance matrix, everybody is transitively
connected to everybody through one or several minimal distance links, so
everybody is grouped together in the end.

> Besides that, and from the viewpoint of a tree representation of the
> result of clustering, I would expect that every pair of two objects of
> same type have common anchestors of the same type. For the given UV
> topology I would not expect that there are two Group3 that have a Group4
> ancestor, while the 3rd Group3 is direct child of Machine. I would
> expect EITHER that the 3rd Group3 is also child of a Group4 (maybe a
> second one), OR that there is no Group4.
>

Right, I'll see if I can fix this without changing to many things in the
1.0 branch.

Brice