Doh!  I forgot to add hwloc-devel before hitting send. 

Brice / Samuel - see below. 

Sent from my phone. No type good. 

On Nov 2, 2011, at 8:40 AM, "Jeff Squyres (jsquyres)" <jsquyres@cisco.com> wrote:

Re: hwloc problem

Chris -

I totally missed this email; sorry!  I'm cc-ing hwloc-devel to see if brice/Samuel can fix.

I'm assuming we'll need this fix in the 1.2 hwloc branch as well. (I'm also assuming that the trunk referred to here is the OMPI trunk, now the hwloc trunk).

Sent from my phone. No type good.

On Oct 26, 2011, at 6:15 AM, "Christopher Yeoh" <cyeoh@au1.ibm.com> wrote:

> Hi Jeff,
>
> Brad mentioned you might be able to help me with an OMPI hwloc issue
> I'm having.
>
> Its occurring on a Power 5 RHEL 6.0 machine and related to the xml
> representation of the topology. I've attached the xml to this email.
> The problem only occurs on the trunk code.
>
> The part which appears to be the problem is this:
>
>      <distances nbobjs="4" relative_depth="0" latency_base="10.000000">
>        <latency value="1.000000"/>
>        <latency value="1.000000"/>
>        <latency value="1.000000"/>
>        <latency value="1.000000"/>
>        <latency value="1.000000"/>
>        <latency value="1.000000"/>
>        <latency value="1.000000"/>
>        <latency value="1.000000"/>
>        <latency value="1.000000"/>
>        <latency value="1.000000"/>
>        <latency value="1.000000"/>
>        <latency value="1.000000"/>
>        <latency value="1.000000"/>
>        <latency value="1.000000"/>
>        <latency value="1.000000"/>
>        <latency value="1.000000"/>
>      </distances>
>
> specifically with relative_depth having a value of 0, but still having
> latency children information. In hwloc__xml_import_distances in
> topology-xml.c there's a check that assumes there is no latency
> information.
>
> Around line 634 in topology-xml.c:
>
> if (nbobjs && reldepth && latbase) {
>    ... process latency xml nodes
> }
>
> return hwloc__xml_import_close_tag(state);
>
> The hwloc__xml_import_close_tag function returns a failure because the
> latency nodes have not been processed yet.
>
> I had a look in orted where the xml is created and it does look like
> the xml is being assembled correctly as per the topology information it
> has retrieved (though I don't know if that itself is correct). The
> hwloc__xml_export_object function will quite happily create distance
> information if the relative depth is 0 even though
> hwloc__xml_import_distance will not be able to parse it.
>
> So there is at least a problem that the topology code will create xml
> that it can't parse, but I don't know enough about the hwloc library to
> know if relative depth should always be positive. I suspect its the
> former which is the problem not the latter, but I don't know for sure...
>
> If it helps, this is the output of lstopo on the machine:
>
> cyeoh@p5-40-P4-E0:~$ /home/OpenHPC/hwloc/build/bin/lstopo
> Machine (2048MB)
>  NUMANode L#0 (P#0 512MB)
>    Socket L#0 + L1 L#0 (32KB) + Core L#0
>      PU L#0 (P#0)
>      PU L#1 (P#1)
>    Socket L#1 + L1 L#1 (32KB) + Core L#1
>      PU L#2 (P#2)
>      PU L#3 (P#3)
>  NUMANode L#1 (P#1 640MB)
>  NUMANode L#2 (P#2 512MB)
>  NUMANode L#3 (P#3 384MB)
>
> Regards,
>
> Chris
> --
> cyeoh@au.ibm.com
> <fandango_hwloc_xml.txt>