Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] btl_openib_rnr_retry MCA param
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-02-13 09:05:24

Actually, we should then also print out a different error message when
RNR occurs in PP QP's, too. It should be something along the lines of
"flow control problem occurred; this shouldn't happen..." (right now
it says RNR happened, and goes into detail into what that means -- but
that's not the real problem).

I'll do that as well.

On Feb 13, 2008, at 12:59 AM, Gleb Natapov wrote:

> On Tue, Feb 12, 2008 at 05:41:13PM -0500, Jeff Squyres wrote:
>> I see that in the OOB CPC for the openib BTL, when setting up the
>> send
>> side of the QP, we set the rnr_retry value depending on whether the
>> remote receive queue is a per-peer or SRQ:
>> - SRQ: btl_openib_rnr_retry MCA param value
>> - PP: 0
>> The rationale given in a comment is that setting the RNR to 0 is a
>> good way to find bugs in our flow control.
>> Do we really want this in production builds? Or do we want 0 for
>> developer builds and the same btl_openib_rnr_retry value for PP
>> queues?
> The comment is mine and IMO it should stay that way for production
> builds. SW flow control either work or it doesn't and if it doesn't I
> prefer to know about it immediately. Setting PP to some value greater
> then 0 just delays the manifestation of the problem and in the case of
> iWarp such possibility doesn't even exists.
> --
> Gleb.
> _______________________________________________
> devel mailing list
> devel_at_[hidden]

Jeff Squyres
Cisco Systems