Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Scott Atchley (atchley_at_[hidden])
Date: 2006-12-06 15:09:25


On Dec 6, 2006, at 2:29 PM, Brock Palen wrote:

>>
>> I wonder if we can narrow this down a bit to perhaps a PML protocol
>> issue.
>> Start by disabling RDMA by using:
>> -mca btl_gm_flags 1
>
> On the other-hand, with OB1 using btl_gm_flags 1 fixed the error
> problem with OMPI! Which is a great first step.
>
> mpirun -np 4 --mca btl_gm_flags 1 ./xhpl
>
> Allowed HPL to run with no errors. I verified the performance was
> better than when ran without gm
>
> (added --mca btl ^gm )
>
> So still a problem with DR which i dont need but im willing to help
> test it.
>
> Scott,
>
> Can we look into why leaving RDMA on if causing a problem?
>
> Brock

Brock and Galen,

We are willing to assist. Our best guess is that OMPI is using the
code in a way different than MPICH-GM does. One of our other
developers who is more comfortable with the GM API is looking into it.

Testing with HPCC, in addition to the HPL failed residuals, I am also
seeing these messages:

[3]: ERROR: from right: expected 2 and 3 as first and last byte, but
got 2 and 5 instead
[3]: ERROR: from right: expected 3 and 4 as first and last byte, but
got 3 and 7 instead
[1]: ERROR: from right: expected 4 and 5 as first and last byte, but
got 4 and 3 instead
[1]: ERROR: from right: expected 7 and 8 as first and last byte, but
got 7 and 5 instead

which is from $HPCC/src/bench_lat_bw_1.5.2.c.

Scott