Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] scif btl side effects
From: Hjelm, Nathan T (hjelmn_at_[hidden])
Date: 2014-05-12 09:13:03

Hah. Thanks for catching that. I will commit your patch later today.

From: devel [devel-bounces_at_[hidden]] on behalf of Gilles Gouaillardet [gilles.gouaillardet_at_[hidden]]
Sent: Monday, May 12, 2014 4:42 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] scif btl side effects

i wrote this too early ...

the attached program produces incorrect results when ran with
--mca btl scif,vader,self

once the most up-to-date patch of #4610 has been applied, (at least) one
bug remain, and it is in the scif btl

the attached patch fixes it.


On 2014/05/12 16:17, Gilles Gouaillardet wrote:
> Nathan,
> On 2014/05/08 4:21, Hjelm, Nathan T wrote:
>> c) that being said, that should work so there is a bug
>> d) there is a regression in v1.8 and a bug that might have been always here
>> This is probably not a regression. The SCIF btl has been part of the 1.7 series for some time. The nightly MTTs are probably missing one of the cases that causes this problem. Hopefully we can get this fixed before 1.8.2.
> as explained in #4610 (
> the root cause is in the way data are unpacked.
> The scif btl is ok :-)
> when using --mca btl scif,self fragments can be received out of order,
> and that can trigger a bug introduced by r31496
> that being said, --mca btl scif,vader,self does not work with r31496
> reverted.
> the root cause is an other bug in the way data are unpacked, it happen
> also when fragments are received out of order
> *and* fragments contain a subpart of a predefined datatype.
> in this case, the vader btl received a fragment of size 1325 *and* out
> of order and that caused the bug.
> Gilles