On Jun 23, 2009, at 11:04 , Eugene Loh wrote:
> The sm BTL used to have two mechanisms for dealing with congested
> FIFOs. One was to grow the FIFOs. Another was to queue pending
> sends locally (on the sender's side). I think the grow-FIFO
> mechanism was typically invoked and the pending-send mechanism used
> only under extreme circumstances (no more memory).
> With the sm makeover of 1.3.2, we dropped the ability to grow
> FIFOs. The code added complexity and there seemed to be no need to
> have two mechanisms to deal with congested FIFOs. In ticket 1944,
> however, we see that repeated collectives can produce hangs, and
> this seems to be due to the pending-send code not adequately dealing
> with congested FIFOs.
> Today, when a process tries to write to a remote FIFO and fails, it
> queues the write as a pending send. The only condition under which
> it retries pending sends is when it gets a fragment back from a
> remote process.
> I think the logic must have been that the FIFO got congested because
> we issued too many sends. Getting a fragment back indicates that
> the remote process has made progress digesting those sends. In
> ticket 1944, we see that a FIFO can also get congested from too many
> returning fragments. Further, with shared FIFOs, a FIFO could
> become congested due to the activity of a third-party process.
> In sum, getting a fragment back from a remote process is a poor
> indicator that it's time to retry pending sends.
> Maybe the real way to know when to retry pending sends is just to
> check if there's room on the FIFO.
Why this is different than "getting a fragment back"? As far as I
remember the code, when we get a fragment back we add it back in the
LIFO, and therefore it become the next available fragment for a send.
> So, I'll try modifying MCA_BTL_SM_FIFO_WRITE. It'll start by
> checking if there are pending sends. If so, it'll retry them before
> performing the requested write. This should also help preserve
> ordering a little better. I'm guessing this will not hurt our
> message latency in any meaningful way, but I'll check this out.
> Meanwhile, I wanted to check in with y'all for any guidance you
> might have.
> devel mailing list