Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Matt Leininger (mlleinin_at_[hidden])
Date: 2006-04-19 15:53:33


Copying the Open MPI folks on this thread.

  - Matt

On Wed, 2006-04-19 at 12:05 -0700, Sean Hefty wrote:
> I'd like to get some feedback regarding the following approach to supporting
> multicast groups in userspace, and in particular for MPI. Based on side
> conversations, I need to know if this approach would meet the needs of MPI
> developers.
>
> To join / leave a multicast group, my proposal is to add the following APIs to
> the rdma_cm. (Note I haven't implemented this yet, so I'm just assuming that
> it's possible at this point.)
>
> /* Asynchronously join a multicast group. */
> int rdma_set_option(struct rdma_cm_id *id, int level, int optname,
> void *optval, size_t optlen);
>
> /* Retrieve multicast group information - not usually called. */
> int rdma_get_option(struct rdma_cm_id *id, int level, int optname,
> void *optval, size_t optlen);
>
> /*
> * Post a message on the QP associated with the cm_id for the
> * specified multicast address.
> */
> int rdma_sendto(struct rdma_cm_id *id, struct ibv_send_wr *send_wr,
> struct sockaddr *to);
>
> ---
>
> As an example of how these APIs would be used:
>
> /* The cm_id provides event handling and context. */
> rdma_create_id(&id, context);
>
> /* Bind to a local interface to attach to a local device. */
> rdma_bind_addr(id, local_addr);
>
> /* Allocate a PD, CQs, etc. */
> pd = ibv_alloc_pd(id->verbs);
> ...
>
> /*
> * Create a UD QP associated with the cm_id.
> * TBD: automatically transition the QP to RTS for UD QP types?
> */
> rdma_create_qp(id, pd, init_attr);
>
> /* Bind to multicast group. */
> mcast_ip = 224.0.0.74.71; /* some fine mcast addr */
> ip_mreq.imr_multiaddr = mcast_ip.in_addr;
> rdma_set_option(id, RDMA_PROTO_IP, IP_ADD_MEMBERSHIP, &ip_mreq,
> sizeof(ip_mreq));
>
> /* Wait for join to complete. */
> rdma_get_cm_event(&event);
> if (event->event == RDMA_CM_EVENT_JOIN_COMPLETE)
> /* join worked - we could call rdma_get_option() here */
> /* The rdma_cm attached the QP to the multicast group for us. */
> ...
> rdma_ack_cm_event(event);
>
> /*
> * Format a send wr. The ah, remote_qpn, and remote_qkey are
> * filled out by the rdma_cm based on the provided destination
> * address.
> */
> rdma_sendto(id, send_wr, &mcast_ip);
>
> ---
>
> The multicast group information is created / managed by the rdma_cm. The
> rdma_cm defines the mgid, q_key, p_key, sl, flowlabel, tclass, and joinstate.
> Except for mgid, these would most likely match the values used by the ipoib
> broadcast group. The mgid mapping would be similar to that used by ipoib. The
> actual MCMember record would be available to the user by calling
> rdma_get_option.
>
> I don't believe that there would be any restriction on the use of the QP that is
> attached to the multicast group, but it would take more work to support more
> than one multicast group per QP. The purpose of the rdma_sendto() routine is to
> map a given IP address to an allocated address handle and Qkey. At this point,
> rdma_sendto would only work for multicast addresses that have been joined by the
> user.
>
> If a user wanted more control over the multicast group, we could support a call
> such as:
>
> struct ib_mreq {
> struct ib_sa_mcmember_rec rec;
> ib_sa_comp_mask comp_mask;
> }
>
> rdma_set_option(id, RDMA_PROTO_IB, IB_ADD_MEMBERSHIP, &ib_mreq,
> sizeof(ib_mreq));
>
> Thoughts?
>
> - Sean
> _______________________________________________
> openib-general mailing list
> openib-general_at_[hidden]
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>