Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Using opal_convertor_t for In-place send buffers in a BTL component
From: George Bosilca (bosilca_at_[hidden])
Date: 2012-04-05 22:05:32


Alex,

This is indeed quite strange. You're receiving an error about truncated data during a barrier. The MPI_Barrier is the only MPI function that has a synchronization meaning, and does not move data around, so I can hardly see how this can generate a truncation.

You should put a breakpoint in the function recv_request_pml_complete at line 171 (the only place where we set the truncate error), and try to understand how this happens. If you can send the stack trace we might be able to help a little more.

  george.

On Apr 5, 2012, at 20:30 , Alex Margolin wrote:

> Hi,
>
> First, I'm glad to say my MOSIX component is working and giving good initial result. Thanks for all your help!
> I'm not sure how (I know I should fill in some license agreement docs), but I would like to contribute the code to the Open-MPI project.
> Is there an official code-review process? anything else other then test it on some machines and commit it if/when I get the permissions?
>
> Second, I have a question about In-place send buffers. My mca_btl_mosix_prepare_src() currently works like this:
>
> mca_btl_base_descriptor_t*
> mca_btl_mosix_prepare_src(struct mca_btl_base_module_t* btl,
> struct mca_btl_base_endpoint_t* endpoint,
> struct mca_mpool_base_registration_t* registration,
> struct opal_convertor_t* convertor,
> uint8_t order,
> size_t reserve,
> size_t* size,
> uint32_t flags)
> {
> mca_btl_mosix_frag_t* frag;
> struct iovec iov;
> uint32_t iov_count = 1;
> size_t result;
> int rc;
>
> /* Enforce upper message length limit */
> if( OPAL_UNLIKELY((reserve + *size) > btl->btl_max_send_size) ) {
> *size = btl->btl_max_send_size - reserve;
> }
>
> /* Fetch a fragment to work on */
> if( *size + reserve <= btl->btl_eager_limit ) {
> MCA_BTL_MOSIX_FRAG_ALLOC_EAGER(frag, rc);
> } else {
> MCA_BTL_MOSIX_FRAG_ALLOC_MAX(frag, rc);
> }
> if( OPAL_UNLIKELY(NULL == frag) ) {
> return NULL;
> }
> frag->segments[0].seg_addr.pval = (void*)(frag + 1);
> frag->segments[0].seg_len = reserve;
>
> /* Fill it with outgoing data */
> iov.iov_len = frag->size - reserve;
> /**************** if( opal_convertor_need_buffers(convertor) ) { ****************/
> if( 0 != reserve ) {
> /* Use existing buffer at the end of the fragment */
> iov.iov_base = (unsigned char*)frag->segments[0].seg_addr.pval + reserve;
> rc = opal_convertor_pack( convertor, &iov, &iov_count, &result );
> if( 0 > rc ) {
> MCA_BTL_MOSIX_FRAG_RETURN(frag);
> return NULL;
> }
> frag->segments[0].seg_len += result;
> frag->base.des_src_cnt = 1;
> } else {
> iov.iov_base = NULL;
> /* Read the iovec for the buffer to be transfered */
> rc = opal_convertor_pack( convertor, &iov, &iov_count, &result );
> if( rc < 0 ) {
> MCA_BTL_MOSIX_FRAG_RETURN(frag);
> return NULL;
> }
> frag->segments[1].seg_addr.pval = iov.iov_base;
> frag->segments[1].seg_len = result;
> frag->base.des_src_cnt = 2;
> }
> frag->base.des_src = frag->segments;
> frag->base.order = MCA_BTL_NO_ORDER;
> frag->base.des_dst = NULL;
> frag->base.des_dst_cnt = 0;
> frag->base.des_flags = flags;
> return &frag->base;
> }
>
> - Notice that the condition line on the convertor I tried to copy from the TCP equivalent is commented out. If I switch the condition I get:
>
> [singularity:3774] *** An error occurred in MPI_Barrier
> [singularity:3774] *** reported by process [3220963329,0]
> [singularity:3774] *** on communicator MPI_COMM_WORLD
> [singularity:3774] *** MPI_ERR_TRUNCATE: message truncated
> [singularity:3774] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> [singularity:3774] *** and potentially your MPI job)
> [singularity:03773] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
> [singularity:03773] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
> alex_at_singularity:~/huji/benchmarks/simple$
>
> I understand that at the moment the buffer sent by the user is copied to (void*)(frag+1) even if it would be best for it to be left in its place, with the reserved data at frag->segments[0] and the user buffer at frag->segments[1]. Does anyone have an idea as to what would cause that? Maybe a problem on the receiver-side function?
>
> Thanks,
> Alex
>
> P.S. I know this problem happens with 8-byte messages but 4-byte pass OK. I don't know if it helps.
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel