On Wed, Feb 13, 2008 at 09:05:24AM -0500, Jeff Squyres wrote:
> Actually, we should then also print out a different error message when
> RNR occurs in PP QP's, too. It should be something along the lines of
> "flow control problem occurred; this shouldn't happen..." (right now
> it says RNR happened, and goes into detail into what that means -- but
> that's not the real problem).
> I'll do that as well.
> On Feb 13, 2008, at 12:59 AM, Gleb Natapov wrote:
> > On Tue, Feb 12, 2008 at 05:41:13PM -0500, Jeff Squyres wrote:
> >> I see that in the OOB CPC for the openib BTL, when setting up the
> >> send
> >> side of the QP, we set the rnr_retry value depending on whether the
> >> remote receive queue is a per-peer or SRQ:
> >> - SRQ: btl_openib_rnr_retry MCA param value
> >> - PP: 0
> >> The rationale given in a comment is that setting the RNR to 0 is a
> >> good way to find bugs in our flow control.
> >> Do we really want this in production builds? Or do we want 0 for
> >> developer builds and the same btl_openib_rnr_retry value for PP
> >> queues?
> > The comment is mine and IMO it should stay that way for production
> > builds. SW flow control either work or it doesn't and if it doesn't I
> > prefer to know about it immediately. Setting PP to some value greater
> > then 0 just delays the manifestation of the problem and in the case of
> > iWarp such possibility doesn't even exists.
> > --
> > Gleb.
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Jeff Squyres
> Cisco Systems
> devel mailing list