Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] v1.5 r25914 DOA
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-02-22 11:42:45

Much simpler solution - on that platform, you should add "orte_num_sockets=1" to your default mca param file. Problem solved. It's why that param exists, and we added it specifically at Terry's request for an earlier, similar problem.

On Feb 22, 2012, at 8:55 AM, Brice Goglin wrote:

> Le 22/02/2012 07:36, Eugene Loh a écrit :
>> On 2/21/2012 5:40 PM, Paul H. Hargrove wrote:
>>> Here are the first of the results of the testing I promised.
>>> I am not 100% sure how to reach the code that Eugene reported as
>>> problematic,
>> I don't think you're going to see it. Somehow, hwloc on the config in
>> question thinks there is no socket level and returns num_sockets==0.
>> If you can run something successfully, your platform won't show the
>> issue.
> (Eugene sent hwloc info offlist)
> This is an "interesting" case. Last time I used a RHEL4 2.6.9 kernel, it
> had no sysfs topology info, but there was some "physical package" info
> in /proc/cpuinfo. Yours has nothing. Maybe because it's an AMD and/or
> single-core-processor based system. sysfs still has NUMA topology info
> (this was added to the kernel around 2.5 iirc) so we get 2 NUMA nodes
> with one core each but no socket at all. We could assume there one
> socket per NUMA node but that's a risky hack.
> Anyway, we have seen other systems (mostly non-Linux) where lstopo
> reports nothing interesting (only one machine object with multiple PU
> children). So numsockets==0 isn't really uncommon. Replacing 0 with 1
> will likely work for your computations. Make sure the code isn't going
> to use the first hwloc socket object later, it would get NULL obviously.
> Brice
> _______________________________________________
> devel mailing list
> devel_at_[hidden]