Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] BTL receive callback
From: Don Kerr (Don.Kerr_at_[hidden])
Date: 2009-07-22 16:31:17


Hello Sebastian,

Sounds like you are using the openib btl as a starting point, which is a
good place to start. I am curious if you are indeed using a new
interconnect (new hardware and protocol) or if it is requirements of the
3D-torus network that are not addressed by the openib btl that are
driving the need for a new btl?

-DON

On 07/21/09 11:55, Sebastian Rinke wrote:
> Hello,
> I am developing a new BTL component (Open MPI v1.3.2) for a new
> 3D-torus interconnect. During a simple message transfer of 16362 B
> between two nodes with MPI_Send(), MPI_Recv() I encounter the following:
>
> The sender:
> -----------
>
> 1. prepare_src() size: 16304 reserve: 32
> -> alloc() size: 16336
> -> ompi_convertor_pack(): 16304
> 2. send()
> 3. component_progress()
> -> send cb ()
> -> free()
> 4. component_progress()
> -> recv cb ()
> -> prepare_src() size: 58 reserve: 32
> -> alloc() size: 90
> -> ompi_convertor_pack(): 58
> -> free() size: 90 Send is missing !!!
> 5. NO PROGRESS
>
> The receiver:
> -------------
>
> 1. component_progress()
> -> recv cb ()
> -> alloc() size: 32
> -> send()
> 2. component_progress()
> -> send cb ()
> -> free() size: 32
> 3. component_progress() for ever !!!
>
> The problem is that after prepare_src() for the 2nd fragment, the
> sender calls free() instead of send() in its recv cb. Thus, the 2nd
> fragment is not being transmitted.
> As a consequence, the receiver waits for the 2nd fragment.
>
> I have found that mca_pml_ob1_recv_frag_callback_ack() is the
> corresponding recv cb. Before diving into the ob1 code,
> could you tell me under which conditions this cb calls free() instead
> of send()
> so that I can get an idea of where to look for errors in my BTL
> component.
>
> Thank you very much in advance.
>
> Sebastian Rinke
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel