Subject: [OMPI devel] BTL receive callback
From: Sebastian Rinke (rinke_at_[hidden])
Date: 2009-07-21 11:55:43

I am developing a new BTL component (Open MPI v1.3.2) for a new
3D-torus interconnect. During a simple message transfer of 16362 B
between two nodes with MPI_Send(), MPI_Recv() I encounter the following:

The sender:

1. prepare_src() size: 16304 reserve: 32
    -> alloc() size: 16336
    -> ompi_convertor_pack(): 16304
2. send()
3. component_progress()
    -> send cb ()
    -> free()
4. component_progress()
    -> recv cb ()
       -> prepare_src() size: 58 reserve: 32
          -> alloc() size: 90
          -> ompi_convertor_pack(): 58
       -> free() size: 90 Send is missing !!!

The receiver:

1. component_progress()
    -> recv cb ()
       -> alloc() size: 32
       -> send()
2. component_progress()
    -> send cb ()
    -> free() size: 32
3. component_progress() for ever !!!

The problem is that after prepare_src() for the 2nd fragment, the
sender calls free() instead of send() in its recv cb. Thus, the 2nd
fragment is not being transmitted.
As a consequence, the receiver waits for the 2nd fragment.

I have found that mca_pml_ob1_recv_frag_callback_ack() is the
corresponding recv cb. Before diving into the ob1 code,
could you tell me under which conditions this cb calls free() instead
of send()
so that I can get an idea of where to look for errors in my BTL component.

Thank you very much in advance.

Sebastian Rinke