Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] core binding failure on Interlagos (and possibly Magny-Cours)
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-01-31 08:49:18


Le 31/01/2012 14:24, Jeff Squyres a écrit :
> On Jan 31, 2012, at 6:18 AM, Dave Love wrote:
>
>> Core binding is broken on Interlagos with open-mpi 1.5.4. I guess it
>> also bites on Magny-Cours, but all our systems are currently busy and I
>> can't check.
>>
>> It does work, at least basically, in 1.5.5rc1, but the release notes for
>> that don't give any indication. Perhaps someone could mention
>> Interlagos in the notes, and any other hardware that might be affected
>> (presumably Magny-Cours and some Power if it's confusion introduced by
>> the extra NUMA level).
> I think there was some weirdness in how AMD chips were represented to the Linux kernel (they present differently than Intel chips). I believe the issues have been worked out by hwloc.

Right, AMD "dual-core modules" are reported almost exactly as "a single
hyperthreaded core" by the kernel. We had to tweak hwloc to detect two
different cores. So you get 32 cores and 32 PUs (hwloc >= 1.2.1) instead
of 16 cores and 32 PUs (hwloc <1.2.1).

If you don't have this hwloc change, I guess binding to core breaks
because you have 16 cores for 32 processes. I don't know if there's an
easy way to tell OMPI 1.5.4 to bind to PUs instead of Cores. This should
work as expected.

Unless I am mistaken, OMPI 1.5.4 has hwloc 1.2 while 1.5.5 will have
1.2.2 or even 1.3.1. So don't use core binding on interlagos with
OMPI<=1.5.4.

Note that magny-Cours processors are OK, cores are "normal" there.

FWIW, the Linux kernel (at least up to 3.2) still reports wrong L2 and
L1i cache information on AMD Bulldozer. Kernel bug reported at
https://bugzilla.kernel.org/show_bug.cgi?id=42607

Brice