Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bug in openmpi 1.5.4 in paffinity
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2011-09-05 15:29:05


Le 04/09/2011 23:30, Brice Goglin a écrit :
> Le 04/09/2011 22:35, Ake Sandgren a écrit :
>> On Sun, 2011-09-04 at 22:13 +0200, Brice Goglin wrote:
>>> Hello,
>>>
>>> Could you log again on this node (with same cgroups enabled), run
>>> hwloc-gather-topology <name>
>>> and send the resulting <name>.output and <name>.tar.bz2?
>>>
>>> Send them to the hwloc-devel or open a ticket on
>>> https://svn.open-mpi.org/trac/hwloc (or send them to me in private if
>>> you don't want to subscribe).
>> Since it's a bit late here i'm lazy and sending to you directly.
>>
>> Output from both nodes involved in the batchjob
>> slurm -N 2 --ntasks-per-node=1 ... was what i was using.
>>
>> Hope it helps. If not let me know if there is anything else i can do.
>>
>> /Ã…ke S.
> Thanks, I understand the problem but it's not easy to fix. To workaround
> the crash until I come with a real fix, you can comment-out
> hwloc_topology__set_distance_matrix()
> at the end of look_sysfsnode() in topology-linux.c

Dear Ake,
Could you try the attached patch? It's not optimized, but it's probably
going in the right direction.
(and don't forget to remove the above comment-out if you tried it).
Thanks
Brice