Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Threaded progress for CPCs
From: Steve Wise (swise_at_[hidden])
Date: 2008-05-19 16:44:04


Jeff Squyres wrote:
> On May 19, 2008, at 3:40 PM, Jon Mason wrote:
>
>
>>>> iWARP needs preposted recv buffers (or it will drop the
>>>> connection). So
>>>> this isn't a good option.
>>>>
>>> I was talking about SRQ only. You said above that iwarp does
>>> retransmit for SRQ.
>>> openib BTL relies on HW retransmit when using SRQ, so if iwarp
>>> doesn't do it
>>> reliably enough it can not be used with SRQ anyway.
>>>
>> How iWARP adapters behave with respect to SRQ retransmit is 100% HW
>> dependent.
>>
>
> It was my understanding that it's at least the same as how TCP handles
> a dropped packet. The HW may do better than that.
>
>
>> The HW can queue some of the receives internally or use the HW TCP
>> stack to have
>> it retransmit. Of course, this is a BAD thing to do. The SRQ "low-
>> water marker"
>> event is the best way to handle these cases.
>>
>
>
> I disagree. I even think that the IB-retry-forever approach is bad.
> Here's why:
>
> 1. Posting more at low watermark can lead to DoS-like behavior when
> you have a fast sender and a slow receiver. This is exactly the
> resource-exhaustion kind of behavior that a high quality MPI
> implementation is supposed to avoid -- we really should to throttle
> the sender somehow.
>
> 2. Resending ad infinitum simply eats up more bandwidth and takes away
> network resources (e.g., switch resources) that other, legitimate
> traffic. Particularly if the receiver doesn't dip into the MPI layer
> for many hours. So yes, it *works*, but it's definitely sub-optimal.
>
>
The SRQ low water mark is simply an API method to allow applications to
try and never hit the "we're totally out recv bufs" problem. That's a
tool that I think is needed for srq users no matter what flow control
method you use to try and avoid jeff's #1 item above.

And if you don't like RNR retry/TCP retrans approach, which is bad for
reason #2 (and because TCP will eventually give up and reset the
connection), then I think there needs to be some OMPI layer protocol to
stop senders that are abusing the SRQ pool for whatever reason (too fast
of a sender, sleeping beauty receiver never entering OMPI layer, whtaever).

my 1/2 cent...

Steve.