Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] now 1.9 [was: I have still a problem withrankfiles in openmpi-1.6.4rc3]
From: Eugene Loh (eugene.loh_at_[hidden])
Date: 2013-02-10 10:45:43


On 2/10/2013 1:14 AM, Siegmar Gross wrote:
>> I don't think the problem is related to Solaris. I think it's also on Linux.
>> E.g., I can reproduce the problem with 1.9a1r28035 on Linux using GCC compilers.
>>
>> Siegmar: can you confirm this is a problem also on Linux? E.g.,
>> with OMPI 1.9, on one of your Linux nodes (linpc0?) try
>>
>> % cat myrankfile
>> rank 0=linpc0 slot=0:1
>> % mpirun --report-bindings --rankfile myrankfile numactl --show
>>
>> For me, the binding I get is not 0:1 but 0,1.
> I get the following outputs for openmpi-1.6.4rc4 (without your patch)

Okay thanks, but 1.6 is not the issue here. There is something going on
in 1.9/trunk that is very different. Thanks for the 1.6 output, but
it's all right.

> and openmpi-1.9 (both compiled with Sun C 5.12).

Thanks for the confirmation. You, too, are showing Linux demonstrating
this problem. It looks like bindings are wrong in 1.9. Ralph says he's
taking a look. The rankfile says "0:1", but you're getting "0,1".

> linpc1 rankfiles 96 mpirun --report-bindings --rankfile rf_1_linux numactl --show
> [linpc1:16061] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.]
> physcpubind: 0 1
> linpc1 rankfiles 97 ompi_info | grep "MPI:"
> Open MPI: 1.9a1r28035