I ran into a hang in a test in which the sender sends less data than the receiver is expecting.  For example, the following shows the receiver expecting twice what the sender is sending.


Rank 0:  MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD)

Rank 1: MPI_Recv(buf, BUFSIZE*2,  MPI_INT, 0, 99, MPI_COMM_WORLD)


This is also reproducible using one of the intel tests and adjusting the eager value for the openib BTL.

Ų  mpirun –np 2 –host frick,frack –mca btl_openib_eager_limit 56 MPI_Send_overtake_c


In most cases, this works just fine.  However, when the PML protocol used is the RGET protocol, the test hangs.   Below is a proposed fix for this issue.

I believe we want to be checking against req_bytes_packed rather than req_bytes_expected as req_bytes_expected is what the user originally told us.

Otherwise, with the current code, we never send a FIN message back to the sender.


Any thoughts?


[rvandevaart@sm065 ompi-trunk]$ svn diff ompi/mca/pml/ob1/pml_ob1_recvreq.c

Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c


--- ompi/mca/pml/ob1/pml_ob1_recvreq.c        (revision 28633)

+++ ompi/mca/pml/ob1/pml_ob1_recvreq.c     (working copy)

@@ -335,7 +335,7 @@

     /* is receive request complete */

     OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, frag->rdma_length);

-    if (recvreq->req_bytes_expected <= recvreq->req_bytes_received) {

+    if (recvreq->req_recv.req_bytes_packed <= recvreq->req_bytes_received) {






