Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Threaded progress for CPCs
From: Jon Mason (jon_at_[hidden])
Date: 2008-05-19 15:40:46


On Mon, May 19, 2008 at 10:12:19PM +0300, Gleb Natapov wrote:
> On Mon, May 19, 2008 at 01:52:22PM -0500, Jon Mason wrote:
> > On Mon, May 19, 2008 at 05:17:57PM +0300, Gleb Natapov wrote:
> > > On Mon, May 19, 2008 at 05:08:17PM +0300, Pavel Shamis (Pasha) wrote:
> > > > >> 5. ...?
> > > > >>
> > > > > What about moving posting of receive buffers into main thread. With
> > > > > SRQ it is easy: don't post anything in CPC thread. Main thread will
> > > > > prepost buffers automatically after first fragment received on the
> > > > > endpoint (in btl_openib_handle_incoming()).
> > > > It still doesn't guaranty that we will not see RNR (as I understand we
> > > > trying to resolve this problem for iwarp?!)
> > > >
> > > I don't think that iwarp has SRQ at all. And if it has then it should
> >
> > While Chelsio does not currently have an adapter that has SRQs, there are
> > some other iWARP vendors that do have them.
> >
> > > have HW flow control for it too. I don't see what advantage SRQ without
> > > flow control can provide over PPRQ.
> >
> > Technically, this is not flow control, it is a retransmit. iWARP can use
> > the HW TCP stack to retransmit, but it will not have the "retransmit
> > forever" ability that setting rnr_retry to 7 has for IB.
> For how long will it try to retransmit before dropping connection.
>
> >
> > > > So this solution will cost 1 buffer on each srq ... sounds acceptable
> > > > for me. But I don't see too much
> > > > difference compared to #1, as I understand we anyway will be need the
> > > > pipe for communication with main thread.
> > > > so why don't use #1 ?
> > > What communication? No communication at all. Just don't prepost buffers
> > > to SRQ during connection establishment. Problem solved (only for SRQ of
> > > cause).
> >
> > iWARP needs preposted recv buffers (or it will drop the connection). So
> > this isn't a good option.
> I was talking about SRQ only. You said above that iwarp does retransmit for SRQ.
> openib BTL relies on HW retransmit when using SRQ, so if iwarp doesn't do it
> reliably enough it can not be used with SRQ anyway.

How iWARP adapters behave with respect to SRQ retransmit is 100% HW dependent.
The HW can queue some of the receives internally or use the HW TCP stack to have
it retransmit. Of course, this is a BAD thing to do. The SRQ "low-water marker"
event is the best way to handle these cases.

Thanks,
Jon

>
> --
> Gleb.
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel