On 03/03/2014 05:06 PM, Brice Goglin wrote:
> Le 03/03/2014 23:02, Gus Correa a écrit :
>> I rebooted the node and ran hwloc-gather-topology again.
>> This turn it didn't throw any errors on the terminal window,
>> which may be a good sign.
>> [root_at_node14 ~]# hwloc-gather-topology /tmp/`date
>> +"%Y%m%d%H%M"`.$(uname -n)
>> Hierarchy gathered in /tmp/201403031639.node14.tar.bz2 and kept in
>> Expected topology output stored in /tmp/201403031639.node14.output
>> I attach the diagnostic files.
>> Was the problem fixed with the processor re-seating, or is it still
> Everything looks good now. Looks like the problem is gone. Something bad
> happened somewhere before you repluged the processor, we'll never know
> exactly what :)
Reporting back to you that I ran the OMPI connectivity_c.c example on
node14, binding to core, and everything worked fine.
So, I am moving node14 back to production.
When I removed one of node14's processors from the socket,
I saw a sub-millimeter sized bit of dust, which I then blew away.
I am not sure if it was there already, or made it in when I
took the processor out.
In any case, that tubt but if dust is the only suspect
I have for causing the problem.
Computer rooms need to be vacuum cleaned. Occasionally at least. :)
Many thanks for your help.
This nowhere land between HW and SW is always a slippery road,
and I am glad that you guided me to a solution.