Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] Using opal_convertor_t for In-place send buffers in a BTL component
From: Alex Margolin (alex.margolin_at_[hidden])
Date: 2012-04-05 20:30:37


Hi,

First, I'm glad to say my MOSIX component is working and giving good
initial result. Thanks for all your help!
I'm not sure how (I know I should fill in some license agreement docs),
but I would like to contribute the code to the Open-MPI project.
Is there an official code-review process? anything else other then test
it on some machines and commit it if/when I get the permissions?

Second, I have a question about In-place send buffers. My
mca_btl_mosix_prepare_src() currently works like this:

mca_btl_base_descriptor_t*
mca_btl_mosix_prepare_src(struct mca_btl_base_module_t* btl,
                           struct mca_btl_base_endpoint_t* endpoint,
                           struct mca_mpool_base_registration_t*
registration,
                           struct opal_convertor_t* convertor,
                           uint8_t order,
                           size_t reserve,
                           size_t* size,
                           uint32_t flags)
{
     mca_btl_mosix_frag_t* frag;
     struct iovec iov;
     uint32_t iov_count = 1;
     size_t result;
     int rc;

     /* Enforce upper message length limit */
     if( OPAL_UNLIKELY((reserve + *size) > btl->btl_max_send_size) ) {
         *size = btl->btl_max_send_size - reserve;
     }

     /* Fetch a fragment to work on */
     if( *size + reserve <= btl->btl_eager_limit ) {
         MCA_BTL_MOSIX_FRAG_ALLOC_EAGER(frag, rc);
     } else {
         MCA_BTL_MOSIX_FRAG_ALLOC_MAX(frag, rc);
     }
     if( OPAL_UNLIKELY(NULL == frag) ) {
         return NULL;
     }
     frag->segments[0].seg_addr.pval = (void*)(frag + 1);
     frag->segments[0].seg_len = reserve;

     /* Fill it with outgoing data */
     iov.iov_len = frag->size - reserve;
     /**************** if( opal_convertor_need_buffers(convertor) ) {
****************/
     if( 0 != reserve ) {
         /* Use existing buffer at the end of the fragment */
         iov.iov_base = (unsigned char*)frag->segments[0].seg_addr.pval
+ reserve;
         rc = opal_convertor_pack( convertor, &iov, &iov_count, &result );
         if( 0 > rc ) {
             MCA_BTL_MOSIX_FRAG_RETURN(frag);
             return NULL;
         }
         frag->segments[0].seg_len += result;
         frag->base.des_src_cnt = 1;
     } else {
         iov.iov_base = NULL;
         /* Read the iovec for the buffer to be transfered */
         rc = opal_convertor_pack( convertor, &iov, &iov_count, &result );
         if( rc < 0 ) {
             MCA_BTL_MOSIX_FRAG_RETURN(frag);
             return NULL;
         }
         frag->segments[1].seg_addr.pval = iov.iov_base;
         frag->segments[1].seg_len = result;
         frag->base.des_src_cnt = 2;
     }
     frag->base.des_src = frag->segments;
     frag->base.order = MCA_BTL_NO_ORDER;
     frag->base.des_dst = NULL;
     frag->base.des_dst_cnt = 0;
     frag->base.des_flags = flags;
     return &frag->base;
}

- Notice that the condition line on the convertor I tried to copy from
the TCP equivalent is commented out. If I switch the condition I get:

[singularity:3774] *** An error occurred in MPI_Barrier
[singularity:3774] *** reported by process [3220963329,0]
[singularity:3774] *** on communicator MPI_COMM_WORLD
[singularity:3774] *** MPI_ERR_TRUNCATE: message truncated
[singularity:3774] *** MPI_ERRORS_ARE_FATAL (processes in this
communicator will now abort,
[singularity:3774] *** and potentially your MPI job)
[singularity:03773] 1 more process has sent help message
help-mpi-errors.txt / mpi_errors_are_fatal
[singularity:03773] Set MCA parameter "orte_base_help_aggregate" to 0 to
see all help / error messages
alex_at_singularity:~/huji/benchmarks/simple$

I understand that at the moment the buffer sent by the user is copied to
(void*)(frag+1) even if it would be best for it to be left in its place,
with the reserved data at frag->segments[0] and the user buffer at
frag->segments[1]. Does anyone have an idea as to what would cause that?
Maybe a problem on the receiver-side function?

Thanks,
Alex

P.S. I know this problem happens with 8-byte messages but 4-byte pass
OK. I don't know if it helps.