Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] hwloc error in topology.c in OMPI 1.6.5
From: Gus Correa (gus_at_[hidden])
Date: 2014-03-04 16:06:24

On 03/03/2014 05:06 PM, Brice Goglin wrote:
> Le 03/03/2014 23:02, Gus Correa a écrit :
>> I rebooted the node and ran hwloc-gather-topology again.
>> This turn it didn't throw any errors on the terminal window,
>> which may be a good sign.
>> [root_at_node14 ~]# hwloc-gather-topology /tmp/`date
>> +"%Y%m%d%H%M"`.$(uname -n)
>> Hierarchy gathered in /tmp/201403031639.node14.tar.bz2 and kept in
>> /tmp/tmp.FM97IQCCKc/201403031639.node14/
>> Expected topology output stored in /tmp/201403031639.node14.output
>> I attach the diagnostic files.
>> Was the problem fixed with the processor re-seating, or is it still
>> there?
> Everything looks good now. Looks like the problem is gone. Something bad
> happened somewhere before you repluged the processor, we'll never know
> exactly what :)
> Brice

Hi Brice

Reporting back to you that I ran the OMPI connectivity_c.c example on
node14, binding to core, and everything worked fine.
So, I am moving node14 back to production.

When I removed one of node14's processors from the socket,
I saw a sub-millimeter sized bit of dust, which I then blew away.
I am not sure if it was there already, or made it in when I
took the processor out.
In any case, that tubt but if dust is the only suspect
I have for causing the problem.
Computer rooms need to be vacuum cleaned. Occasionally at least. :)

Many thanks for your help.
This nowhere land between HW and SW is always a slippery road,
and I am glad that you guided me to a solution.

Gus Correa