Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Threaded progress for CPCs
From: Pavel Shamis (Pasha) (pasha_at_[hidden])
Date: 2008-05-20 06:36:10


>> Is it possible to have sane SRQ implementation without HW flow
>> control?
>>
>
> It seems pretty unlikely if the only available HW flow control is to
> terminate the connection. ;-)
>
>
>>> Even if we can get the iWARP semantics to work, this feels kinda
>>> icky. Perhaps I'm overreacting and this isn't a problem that needs
>>> to
>>> be fixed -- after all, this situation is no different than what
>>> happens after the initial connection, but it still feels icky.
>>>
>> What is so icky about it? Sender is faster than a receiver so flow
>> control
>> kicks in.
>>
>
> My point is that we have no real flow control for SRQ.
>
>
>>> 2. The CM progress thread posts its own receive buffers when creating
>>> a QP (which is a necessary step in both CMs). However, this is
>>> problematic in two cases:
>>>
>>>
>> [skip]
>>
>> I don't like 1,2 and 3. :(
>>
>>
>>> 4. Have a separate mpool for drawing initial receive buffers for the
>>> CM-posted RQs. We'd probably want this mpool to be always empty (or
>>> close to empty) -- it's ok to be slow to allocate / register more
>>> memory when a new connection request arrives. The memory obtained
>>> from this mpool should be able to be returned to the "main" mpool
>>> after it is consumed.
>>>
>> This is slightly better, but still...
>>
>
> Agreed; my reactions were pretty much the same as yours.
>
>
>>> 5. ...?
>>>
>> What about moving posting of receive buffers into main thread. With
>> SRQ it is easy: don't post anything in CPC thread. Main thread will
>> prepost buffers automatically after first fragment received on the
>> endpoint (in btl_openib_handle_incoming()). With PPRQ it's more
>> complicated. What if we'll prepost dummy buffers (not from free list)
>> during IBCM connection stage and will run another three way handshake
>> protocol using those buffers, but from the main thread. We will need
>> to
>> prepost one buffer on the active side and two buffers on the passive
>> side.
>>
>
>
> This is probably the most viable alternative -- it would be easiest if
> we did this for all CPC's, not just for IBCM:
>
> - for PPRQ: CPCs only post a small number of receive buffers, suitable
> for another handshake that will run in the upper-level openib BTL
> - for SRQ: CPCs don't post anything (because the SRQ already "belongs"
> to the upper level openib BTL)
>
Currently I Iwarp do not have SRQ at and and IMHO the SRQ in not
possible without HW flow control
So lets resolve the problem only for PPRQ ?

> Do we have a BSRQ restriction that there *must* be at least one PPRQ?
>
No it is not such restriction.
> If so, we could always run the upper-level openib BTL really-post-the-
> buffers handshake over the smallest buffer size BSRQ RC PPRQ (i.e.,
> have the CPC post a single receive on this QP -- see below), which
> would make things much easier. If we don't already have this
> restriction, would we mind adding it? We have one PPRQ in our default
> receive_queues value, anyway.
>
I don't see such reason to add such restrictions, at least for IB.
We may add it for Iwarp only (actually we already have it for Iwarp)