This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
Hi Open MPI developers,
I found a small bug in Open MPI.
See attached program cancelled.c.
In this program, rank 1 tries to cancel a MPI_Irecv and calls a MPI_Recv
instead if the cancellation succeeds. This program should terminate whether
the cancellation succeeds or not. But it leads a deadlock in MPI_Recv after
printing "MPI_Test_cancelled: 1".
I confirmed it works fine with MPICH2.
The problem is in mca_pml_ob1_recv_request_cancel function in
ompi/mca/pml/ob1/pml_ob1_recvreq.c. It accepts the cancellation unless
the request has been completed. I think it should not accept the
cancellation if the request has been matched. If it want to accept the
cancellation, it must push the recv frag to the unexpected message queue
back and redo matching. Furthermore, the receive buffer must be reverted
if the received message has been written to the receive buffer partially
in a pipeline protocol.
Attached patch cancel-recv.patch is a sample fix for this bug for Open MPI
trunk. Though this patch has 65 lines, main modifications are adding one
if-statement and deleting one if-statement. Other lines are just for
I cannot confirm the MEMCHECKER part is correct. Could anyone review it
MPI development team,