Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RGET issue when send is less than receive
From: Nathan Hjelm (hjelmn_at_[hidden])
Date: 2013-06-21 11:59:39


Found my original fix (still don't know why I never pushed it) and I think George is correct. This should in both the single and multiple get cases.

-Nathan

On Fri, Jun 21, 2013 at 05:52:28PM +0200, George Bosilca wrote:
> The amount of bytes received is atomically updated on the completion callback, and the completion test is clearly spelled-out int the recv_request_pml_complete_check function (of course minus the lock part). Rolf I think your patch is correct.
>
> That being said req_bytes_expected is a special value, one that should only be used to check from truncation. Otherwise, req_bytes_packed is the value we should compare against.
>
> George.
>
> On Jun 21, 2013, at 17:40 , Nathan Hjelm <hjelmn_at_[hidden]> wrote:
>
> > I thought I fixed this problem awhile back (though looking at the code its possible I never committed the fix). I will have to look through my local repository and see what happened to that fix. Your fix might not work correctly since a RGET can be broken up into multiple get operations. It may work, I would just need to test it to make sure.
> >
> > -Nathan
> >
> > On Fri, Jun 21, 2013 at 08:25:29AM -0700, Rolf vandeVaart wrote:
> >> I ran into a hang in a test in which the sender sends less data than the receiver is expecting. For example, the following shows the receiver expecting twice what the sender is sending.
> >>
> >> Rank 0: MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD)
> >> Rank 1: MPI_Recv(buf, BUFSIZE*2, MPI_INT, 0, 99, MPI_COMM_WORLD)
> >>
> >> This is also reproducible using one of the intel tests and adjusting the eager value for the openib BTL.
> >>
> >> ? mpirun -np 2 -host frick,frack -mca btl_openib_eager_limit 56 MPI_Send_overtake_c
> >>
> >> In most cases, this works just fine. However, when the PML protocol used is the RGET protocol, the test hangs. Below is a proposed fix for this issue.
> >> I believe we want to be checking against req_bytes_packed rather than req_bytes_expected as req_bytes_expected is what the user originally told us.
> >> Otherwise, with the current code, we never send a FIN message back to the sender.
> >>
> >> Any thoughts?
> >>
> >> [rvandevaart_at_sm065 ompi-trunk]$ svn diff ompi/mca/pml/ob1/pml_ob1_recvreq.c
> >> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
> >> ===================================================================
> >> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c (revision 28633)
> >> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c (working copy)
> >> @@ -335,7 +335,7 @@
> >> /* is receive request complete */
> >> OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, frag->rdma_length);
> >> - if (recvreq->req_bytes_expected <= recvreq->req_bytes_received) {
> >> + if (recvreq->req_recv.req_bytes_packed <= recvreq->req_bytes_received) {
> >> mca_pml_ob1_send_fin(recvreq->req_recv.req_base.req_proc,
> >> bml_btl,
> >> frag->rdma_hdr.hdr_rget.hdr_des,
> >>
> >>
> >>
> >> -----------------------------------------------------------------------------------
> >> This email message is for the sole use of the intended recipient(s) and may contain
> >> confidential information. Any unauthorized review, use, disclosure or distribution
> >> is prohibited. If you are not the intended recipient, please contact the sender by
> >> reply email and destroy all copies of the original message.
> >> -----------------------------------------------------------------------------------
> >
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel