Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] hwloc problem
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2011-11-02 08:42:04


Doh! I forgot to add hwloc-devel before hitting send.

Brice / Samuel - see below.

Sent from my phone. No type good.

On Nov 2, 2011, at 8:40 AM, "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]> wrote:

> Chris -
>
> I totally missed this email; sorry! I'm cc-ing hwloc-devel to see if brice/Samuel can fix.
>
> I'm assuming we'll need this fix in the 1.2 hwloc branch as well. (I'm also assuming that the trunk referred to here is the OMPI trunk, now the hwloc trunk).
>
> Sent from my phone. No type good.
>
> On Oct 26, 2011, at 6:15 AM, "Christopher Yeoh" <cyeoh_at_[hidden]> wrote:
>
> > Hi Jeff,
> >
> > Brad mentioned you might be able to help me with an OMPI hwloc issue
> > I'm having.
> >
> > Its occurring on a Power 5 RHEL 6.0 machine and related to the xml
> > representation of the topology. I've attached the xml to this email.
> > The problem only occurs on the trunk code.
> >
> > The part which appears to be the problem is this:
> >
> > <distances nbobjs="4" relative_depth="0" latency_base="10.000000">
> > <latency value="1.000000"/>
> > <latency value="1.000000"/>
> > <latency value="1.000000"/>
> > <latency value="1.000000"/>
> > <latency value="1.000000"/>
> > <latency value="1.000000"/>
> > <latency value="1.000000"/>
> > <latency value="1.000000"/>
> > <latency value="1.000000"/>
> > <latency value="1.000000"/>
> > <latency value="1.000000"/>
> > <latency value="1.000000"/>
> > <latency value="1.000000"/>
> > <latency value="1.000000"/>
> > <latency value="1.000000"/>
> > <latency value="1.000000"/>
> > </distances>
> >
> > specifically with relative_depth having a value of 0, but still having
> > latency children information. In hwloc__xml_import_distances in
> > topology-xml.c there's a check that assumes there is no latency
> > information.
> >
> > Around line 634 in topology-xml.c:
> >
> > if (nbobjs && reldepth && latbase) {
> > ... process latency xml nodes
> > }
> >
> > return hwloc__xml_import_close_tag(state);
> >
> > The hwloc__xml_import_close_tag function returns a failure because the
> > latency nodes have not been processed yet.
> >
> > I had a look in orted where the xml is created and it does look like
> > the xml is being assembled correctly as per the topology information it
> > has retrieved (though I don't know if that itself is correct). The
> > hwloc__xml_export_object function will quite happily create distance
> > information if the relative depth is 0 even though
> > hwloc__xml_import_distance will not be able to parse it.
> >
> > So there is at least a problem that the topology code will create xml
> > that it can't parse, but I don't know enough about the hwloc library to
> > know if relative depth should always be positive. I suspect its the
> > former which is the problem not the latter, but I don't know for sure...
> >
> > If it helps, this is the output of lstopo on the machine:
> >
> > cyeoh_at_p5-40-P4-E0:~$ /home/OpenHPC/hwloc/build/bin/lstopo
> > Machine (2048MB)
> > NUMANode L#0 (P#0 512MB)
> > Socket L#0 + L1 L#0 (32KB) + Core L#0
> > PU L#0 (P#0)
> > PU L#1 (P#1)
> > Socket L#1 + L1 L#1 (32KB) + Core L#1
> > PU L#2 (P#2)
> > PU L#3 (P#3)
> > NUMANode L#1 (P#1 640MB)
> > NUMANode L#2 (P#2 512MB)
> > NUMANode L#3 (P#3 384MB)
> >
> > Regards,
> >
> > Chris
> > --
> > cyeoh_at_[hidden]
> > <fandango_hwloc_xml.txt>