This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
Ok, I think we're mostly converged on a solution. This might not get
implemented immediately (got some other pending v1.3 stuff to bug fix,
etc.), but it'll happen for v1.3.
- endpoint creation will mpool alloc/register a small buffer for
- cpc does not need to call _post_recvs()); instead, it can just post
the single small buffer on each BSRQ QP (from the small buffer on the
- cpc will call _connected() (in the main thread, not the CPC progress
thread) when all BSRQ QPs are connected
- if _post_recvs() was previously called, do the normal "finish
setting up" stuff and declare the endpoint CONNECTED
- if _post_recvs() was not previously called, then:
- call _post_recvs()
- send a short CTS message on the 1st BSRQ QP
- wait for CTS from peer
- when both CTS from peer has arrived *and* we have sent our CTS,
declare endpoint CONNECTED
Doing it this way adds no overhead to OOB/XOOB (who don't need this
extra handshake). I think the code can be factored nicely to make
this not too complicated.
I'll work on this once I figure out the memory corruption I'm seeing
in the receive_queues patch...
Note that this addresses the wireup multi-threading issues -- not
iWarp SRQ issues. We'll tackle those separately, and possibly not for
the initial v1.3.0 release.
On May 20, 2008, at 6:02 AM, Gleb Natapov wrote:
> On Mon, May 19, 2008 at 01:38:53PM -0400, Jeff Squyres wrote:
>>>> 5. ...?
>>> What about moving posting of receive buffers into main thread. With
>>> SRQ it is easy: don't post anything in CPC thread. Main thread will
>>> prepost buffers automatically after first fragment received on the
>>> endpoint (in btl_openib_handle_incoming()). With PPRQ it's more
>>> complicated. What if we'll prepost dummy buffers (not from free
>>> during IBCM connection stage and will run another three way
>>> protocol using those buffers, but from the main thread. We will need
>>> prepost one buffer on the active side and two buffers on the passive
>> This is probably the most viable alternative -- it would be easiest
>> we did this for all CPC's, not just for IBCM:
>> - for PPRQ: CPCs only post a small number of receive buffers,
>> for another handshake that will run in the upper-level openib BTL
>> - for SRQ: CPCs don't post anything (because the SRQ already
>> to the upper level openib BTL)
>> Do we have a BSRQ restriction that there *must* be at least one PPRQ?
> No. We don't have such restriction and I wouldn't want to add it.
>> If so, we could always run the upper-level openib BTL really-post-
>> buffers handshake over the smallest buffer size BSRQ RC PPRQ (i.e.,
>> have the CPC post a single receive on this QP -- see below), which
>> would make things much easier. If we don't already have this
>> restriction, would we mind adding it? We have one PPRQ in our
>> receive_queues value, anyway.
> If there is not PPRQ then we can relay on RNR/retransmit logic in case
> there is not enough buffer in SRQ. We do that anyway in openib BTL
>> With this rationale, once the CPC says "ok, all BSRQ QP's are
>> connected", then _endpoint.c can run a CTS handshake to post the
>> "real" buffers, where each side does the following:
>> - CPC calls _endpoint_connected() to tell the upper level BTL that it
>> is fully connected (the function is invoked in the main thread)
>> - _endpoint_connected() posts all the "real" buffers to all the BSRQ
>> QP's on the endpoint
>> - _endpoint_connected() then sends a CTS control message to remote
>> peer via smallest RC PPRQ
>> - upon receipt of CTS:
>> - release the buffer (***)
>> - set endpoint state of CONNECTED and let all pending messages
>> flow... (as it happens today)
>> So it actually doesn't even have to be a handshake -- it's just an
>> additional CTS sent over the newly-created RC QP. Since it's RC, we
>> don't have to do much -- just wait for the CTS to know that the
>> side has actually posted all the receives that we expect it to have.
>> Since the CTS flows over a PPRQ, there's no issue about receiving the
>> CTS on an SRQ (because the SRQ may not have any buffers posted at any
>> given time).
> Correct. Full handshake is not needed. The trick is to allocate those
> initial buffers in a smart way. IMO initial buffer should be very
> small (a couple of bytes only) and be preallocated on endpoint
> This will solve locking problem.
> devel mailing list