Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] RFC: Calling BTL directly from PML
From: Rolf vandeVaart (rolf.vandevaart_at_[hidden])
Date: 2010-04-19 15:42:43


WHAT: Change many of the functions on the PML layer to use the btl
directly instead of going through the BML layer. [NOTE: I withdraw
my earlier RFC from 04/07/2010 as it was flawed]

WHY: Some PMLs (like the failover one I am working on) may add or
delete BTLs while the program is running. Currently, the act of
mapping out a BTL for communication means removing an entry from the
bml_base_btl_array and shuffling the remaining entries. The problem
is that pointers to entries in the bml_base_btl_array are cached in
the descriptor. After mapping out a BTL, these pointers may no longer
be valid since the entries they are pointing to may have been moved.
It turns out there is no need to cache these pointers as we can
just access the BTL information directly from the PML.

WHERE:
ompi/mca/pml/ob1/pml_ob1_recvreq.c
ompi/mca/pml/ob1/pml_ob1_sendreq.c
ompi/mca/pml/ob1/pml_ob1_recvfrag.c
(and maybe a few other places)

MORE DETAILS:
I sent an email last week, but basically I want to make changes
similar to the example shown below. Note that des->des_context is no
longer used and the free function just calls the btl->btl_free
function directly. My concern is that I might be violating some
PML->BML->BTL abstractions. However, the division between these three
layers seems somewhat fluid, so it may be OK what I am suggesting.

EXAMPLE:
ORIGINAL CODE:
static void
mca_pml_ob1_rget_completion( mca_btl_base_module_t* btl,
                            struct mca_btl_base_endpoint_t* ep,
                            struct mca_btl_base_descriptor_t* des,
                            int status )
{
   mca_pml_ob1_send_request_t* sendreq =
(mca_pml_ob1_send_request_t*)des->des_cbdata;
   mca_bml_base_btl_t* bml_btl = (mca_bml_base_btl_t*)des->des_context;
   size_t req_bytes_delivered = 0;

   /* count bytes of user data actually delivered and check for request
completion */
   MCA_PML_OB1_COMPUTE_SEGMENT_LENGTH( des->des_src, des->des_src_cnt,
                                       0, req_bytes_delivered );
   OPAL_THREAD_ADD_SIZE_T(&sendreq->req_bytes_delivered,
req_bytes_delivered);

   send_request_pml_complete_check(sendreq);
   /* free the descriptor */
   mca_bml_base_free(bml_btl, des);
   MCA_PML_OB1_PROGRESS_PENDING(bml_btl);
}

NEW CODE:
static void
mca_pml_ob1_rget_completion( mca_btl_base_module_t* btl,
                            struct mca_btl_base_endpoint_t* ep,
                            struct mca_btl_base_descriptor_t* des,
                            int status )
{
   mca_pml_ob1_send_request_t* sendreq =
(mca_pml_ob1_send_request_t*)des->des_cbdata;
   size_t req_bytes_delivered = 0;

   /* count bytes of user data actually delivered and check for request
completion */
   MCA_PML_OB1_COMPUTE_SEGMENT_LENGTH( des->des_src, des->des_src_cnt,
                                       0, req_bytes_delivered );
   OPAL_THREAD_ADD_SIZE_T(&sendreq->req_bytes_delivered,
req_bytes_delivered);

   send_request_pml_complete_check(sendreq);
   /* free the descriptor */
   btl->btl_free(btl, des);
   MCA_PML_OB1_PROGRESS_PENDING(btl);
}