Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] sm BTL flow management
From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2009-06-25 16:45:30


On Thu, 25 Jun 2009, Eugene Loh wrote:

> I spoke with Brian and Jeff about this earlier today. Presumably, up through
> 1.2, mca_btl_component_progress would poll and if it received a message
> fragment would return. Then, presumably in 1.3.0, behavior was changed to
> keep polling until the FIFO was empty. Brian said this was based on Terry's
> desire to keep latency as low as possible in benchmarks. Namely, reaching
> down into a progress call was a long code path. It would be better to pick
> up multiple messages, if available on the FIFO, and queue extras up in the
> unexpected queue. Then, a subsequent call could more efficiently find the
> anticipated message fragment.
>
> I don't see how the behavior would impact short-message pingpongs (the
> typical way to measure latency) one way or the other.
>
> I asked Terry, who struggled to remember the issue and pointed me at this
> thread: http://www.open-mpi.org/community/lists/devel/2008/06/4158.php .
> But that is related to an issue that's solved if one keeps polling as long as
> one gets ACKs (but returns as soon as a real message fragment is found).
>
> Can anyone shed some light on the history here? Why keep polling even when a
> message fragment has been found? The downside of polling too aggressively is
> that the unexpected queue can grow (without bounds).
>
> Brian's proposal is to set some variable that determines how many message
> fragments a single mca_btl_sm_component_progress call can drain from the FIFO
> before returning.

I checked, and 1.3.2 definitely drains all messages until the fifo is
empty. If we were to switch to drain until we receive a data message and
that fixes Terry's issue, that seems like a rational change and would not
require the fix I suggested. My assumption had been that we needed to
drain more than one data message per call to component_progress in order
to work around Terry's issue. If not, then let's go with the simple fix
and only drain one data message per enterance to component_progress (but
drain multiple acks if we have a bunch of acks and then a data message in
the queue).

Unfortunately I have no more history than what Terry proposed, but it
looks like the changes were made around that time.

Brian