Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] BTL receive callback
From: Sebastian Rinke (rinke_at_[hidden])
Date: 2009-07-21 15:45:22


Thank you for your hint. I found that prepare_src() didn't
return the correct size, i.e. it did

ompi_convertor_pack(...,&max_data);
*size = max_data;

However, after ompi_convertor_pack(), max_data == 0 thus *size == 0
and free() is called without a prior send() in pml_ob1_sendreq.c:1064

I took this order from btl_openib.c's prepare_src().
So it seems that it doesn't cause any problems there but for me it does.

Thanks for your help.
Sebastian.

Quoting George Bosilca <bosilca_at_[hidden]>:

> Based on your code the only reason I can imagine for the second send
> to never be triggered is that the request is considered completed at
> that point.
>
> I can't imagine how the free is called without a prior send. If I
> look at the code pml_ob1_sendreq.c:1061, the free is only called
> when the send fails, but it is always preceded by a send.
>
> Can you check the return values of the ompi_convertor_pack and
> prepare_src please?
>
> george.
>
> On Jul 21, 2009, at 11:55 , Sebastian Rinke wrote:
>
>> Hello,
>> I am developing a new BTL component (Open MPI v1.3.2) for a new
>> 3D-torus interconnect. During a simple message transfer of 16362 B
>> between two nodes with MPI_Send(), MPI_Recv() I encounter the
>> following:
>>
>> The sender:
>> -----------
>>
>> 1. prepare_src() size: 16304 reserve: 32
>> -> alloc() size: 16336
>> -> ompi_convertor_pack(): 16304
>> 2. send()
>> 3. component_progress()
>> -> send cb ()
>> -> free()
>> 4. component_progress()
>> -> recv cb ()
>> -> prepare_src() size: 58 reserve: 32
>> -> alloc() size: 90
>> -> ompi_convertor_pack(): 58
>> -> free() size: 90 Send is missing !!!
>> 5. NO PROGRESS
>>
>> The receiver:
>> -------------
>>
>> 1. component_progress()
>> -> recv cb ()
>> -> alloc() size: 32
>> -> send()
>> 2. component_progress()
>> -> send cb ()
>> -> free() size: 32
>> 3. component_progress() for ever !!!
>>
>> The problem is that after prepare_src() for the 2nd fragment, the
>> sender calls free() instead of send() in its recv cb. Thus, the 2nd
>> fragment is not being transmitted.
>> As a consequence, the receiver waits for the 2nd fragment.
>>
>> I have found that mca_pml_ob1_recv_frag_callback_ack() is the
>> corresponding recv cb. Before diving into the ob1 code,
>> could you tell me under which conditions this cb calls free()
>> instead of send()
>> so that I can get an idea of where to look for errors in my BTL component.
>>
>> Thank you very much in advance.
>>
>> Sebastian Rinke
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>