Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] v1.5 r25914 DOA
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-02-22 10:55:08

Le 22/02/2012 07:36, Eugene Loh a écrit :
> On 2/21/2012 5:40 PM, Paul H. Hargrove wrote:
>> Here are the first of the results of the testing I promised.
>> I am not 100% sure how to reach the code that Eugene reported as
>> problematic,
> I don't think you're going to see it. Somehow, hwloc on the config in
> question thinks there is no socket level and returns num_sockets==0.
> If you can run something successfully, your platform won't show the
> issue.

(Eugene sent hwloc info offlist)

This is an "interesting" case. Last time I used a RHEL4 2.6.9 kernel, it
had no sysfs topology info, but there was some "physical package" info
in /proc/cpuinfo. Yours has nothing. Maybe because it's an AMD and/or
single-core-processor based system. sysfs still has NUMA topology info
(this was added to the kernel around 2.5 iirc) so we get 2 NUMA nodes
with one core each but no socket at all. We could assume there one
socket per NUMA node but that's a risky hack.

Anyway, we have seen other systems (mostly non-Linux) where lstopo
reports nothing interesting (only one machine object with multiple PU
children). So numsockets==0 isn't really uncommon. Replacing 0 with 1
will likely work for your computations. Make sure the code isn't going
to use the first hwloc socket object later, it would get NULL obviously.