Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [ofa-general] [RFC] XRC -- make receiving XRC QP independent of any one user process
From: Tang, Changqing (changquing.tang_at_[hidden])
Date: 2007-12-21 13:22:29

What we do for heart-beat is using zero-byte rdma_write, the message goes to the peer QP only, there is no need to post anything
on remote side, no need for pinned memory.


> -----Original Message-----
> From: Jack Morgenstein [mailto:jackm_at_[hidden]]
> Sent: Friday, December 21, 2007 12:09 PM
> To: Tang, Changqing
> Cc: pasha_at_[hidden];
> mvapich-discuss_at_[hidden];
> general_at_[hidden]; Open MPI Developers
> Subject: Re: [ofa-general] [RFC] XRC -- make receiving XRC QP
> independent of any one user process
> On Friday 21 December 2007 19:13, Tang, Changqing wrote:
> > This kernel QP is for receiving only, so when there is no
> activity on
> > this QP, can the kernel sends a heart-beat message to check if the
> > remote sending QP is still there (still connected) ? if not, the
> > kernel is safe to cleanup this qp.
> >
> > So whenever the RC connection is broken, kernel can destroy this QP.
> >
> This increases the XRC complexity considerably:
> 1. Need to have a separate kernel thread which will scan ALL
> xrc domains on this host for XRC receive QPs.
> This thread will need to do some form of RDMA_READ/WRITE,
> because otherwise it will interfere with
> the remote (sending side) operation. Furthermore, the
> sending-side XRC QP may not have anyone listening
> on an associated XRC SRQ qp -- it is not meant to be set
> up to receive. We only need an operation that
> will yield a RETRY_EXCEEDED error completion if the
> connection has broken.
> 2. This opens the door for all sorts of nasty race
> conditions, since we will now have a bi-directional
> protocol. For example, what if this feature is being
> combined with APM (valid for RC QPs), and we
> are simply in the middle of a migration, and maybe
> communication is temporarily interrupted.
> We will be killing off the QP without allowing any error
> recovery mechanism to work.
> 3. The application complexity goes up -- we now need the
> sending-side QP to declare a memory region and send
> this region's address to the receiving side so that the
> receiving side (the kernel thread mentioned above)
> can periodically try to read from this region.
> Still, I'll give this some thought. For example, maybe we
> can rdma_read some random (illegal) address -- If the
> connection is alive, we'll get a "remote access error"
> completion, while if its dead, we'll get retry exceeded (need
> to check that the bad rdma read request does not cause the
> QPs to enter an error state).
> - Jack