This is the beauty of the MPI standard. Two reading can lead to two different understanding and therefore implementations ...
On Feb 8, 2011, at 03:24 , Tobias Hilbrich wrote:
> Hi George,
> thanks for your fast answer! I hate the cancel too, but am a tool developer that unfortunately has to deal with it :( Likely, implementing the functionality for something like this in an MPI implementation is far more horrible ...
> My MPI standard says the following (MPI 2.2):
> - "A call to MPI_Cancel marks for cancellation a pending, nonblocking communication operation (send or receive)." ... "If a communication is marked for cancellation, then a MPI_Wait call for that communication is guaranteed to return, ..." (lines 1-10 on page 68)
> (both sentences only got micro changes since MPI 1.0)
> So either the communication is not pending anymore (premise of first sentence not fulfilled), thus it would already be completed (Wait would return). Or its marked for cancellation (independent of it completing in future or not), and the wait would be guaranteed to return (second sentence).
> (Notice that the standard differentiates between marked for cancelation and canceled)
> How I understand your quoted sentence, the system is fine to decide that it was too late to cancel the send, which would mean that it completed, thus the wait should return. But maybe I got it wrong.
Here is the problem. As the MPI standard states that MPI_Cancel is a _local_ operation, nothing can be done locally for a send if the matching information has been already sent to the peer. As a result, send operations cannot be cancelled in most cases (except when the matching info is still pending in some internal queues mainly due to network traffic delays).
> I tested with MPI_Test_cancelled, for the hang situation I can apparently not tell you what happens for the sender, but the receive was canceled. If you use Bsend or Send (standard mode), the send is actually not cancelled (as it was buffered), the receives are still cancelled in this situation.
> Now for the fun part, I tested a little bit:
> - SGI MPT, same behavior as for OpenMPI
> - Mvapich(1), aborts with an error that cancel of sends is not implemented :)
> - MPICH, segfault in MPI_Finalize (on the receiver side during cleanup of some sort)
> - MPICH2, works, both communication calls get cancelled
> - lam, crashes in MPI_Ssend_init/MPI_Recv_init
> - Intel MPI, works, both communication calls get cancelled
This is expected. Both MPICH2 and IntelMPI implemented remote send cancelation. In other words, in the case of a cancel a cancelation request is sent to the peer. If the peer has not yet matched the unexpected message, then it will accept to cancel it, otherwise it will fail. In Open MPI the cancel is a purely local operation.
> Not sure whether I got it all wrong, Tobias
> On Feb 7, 2011, at 8:58 PM, George Bosilca wrote:
>> I forgot to mention that you should test the cancelled status of your request with MPI_TEST_CANCELLED after the MPI_Cancel, as the MPI_Cancel doesn't return an error.
>> On Feb 7, 2011, at 14:52 , George Bosilca wrote:
>>> MPI_Cancel is a tricky beast, and should be handled with extreme care. From my perspective, your problem is not related to a specific implementation, but to you usage of the MPI_Cancel.
>>> You state the MPI_Wait is not supposed to hang but I don't see anything in the MPI standard allowing you to state this? If you are referring to the first paragraph on 3.8 (regarding MPI_Cancel), then I have to disagree with you. You have to pay attention to the wording of the standard to see the trick.
>>>> Either the cancellation succeeds, or the communication succeeds, but not both.
>>> This is the definition of a successful cancellation, that is the base of every other action that happen on the request. As the MPI_Cancel is only defined as a local operation, an MPI library the send the matching info for the persistent request in MPI_Start, will have a hard time canceling the request.
>>> Now, imagine a run where the receiver manage to cancel his request as it has not been matched (and this can be done locally). As the sender sent the matching information on MPI_Start, when it reach the MPI_Cancel it cannot cancel the request locally, so the cancel will fail. The sender will therefore be blocked on the MPI_Wait, which the receiver will happily wait on the MPI_Finalize.
>>> On Feb 7, 2011, at 04:54 , Tobias Hilbrich wrote:
>>>> Hi all,
>>>> I am with the ZIH developers working on VampirTrace and just discovered a possibly erroneous behavior of OpenMPI (v1.4.3). I am canceling an active persistent request created with MPI_Ssend_init, in a successive MPI_Wait call the process hangs, even though according to the MPI standard this should never happen.
>>>> The pesudo code is as follows:
>>>> if (rank == 0)
>>>> MPI_Ssend_init (&buf, 1, MPI_INT, 1, 666, MPI_COMM_WORLD, &r);
>>>> if (rank == 1)
>>>> MPI_Recv_init (&buf, 1, MPI_INT, 0, 666, MPI_COMM_WORLD, &r);
>>>> MPI_Start (&r);
>>>> MPI_Cancel (&r);
>>>> MPI_Wait (&r, &status);
>>>> MPI_Request_free (&r);
>>>> The full (minimal reproducer) source code along with a dump of ompi_info is attached.
>>>> Either I am missing some passage of the standard mentioning that it is forbidden to cancel an synchronous send or there appears to be an error in OpenMPI's implementation. If it is already fixed, sorry for the spam.
>>>> (Note: changing the Ssend to Send or Bsend removes the hang)
>>>> Dipl.-Inf. Tobias Hilbrich
>>>> Wissenschaftlicher Mitarbeiter
>>>> Technische Universitaet Dresden
>>>> Zentrum fuer Informationsdienste und Hochleistungsrechnen (ZIH)
>>>> (Center for Information Services and High Performance Computing (ZIH))
>>>> Interdisziplinäre Anwenderunterstützung und Koordination
>>>> (Interdisciplinary Application Development and Coordination)
>>>> 01062 Dresden
>>>> Tel.: +49 (351) 463-32041
>>>> Fax: +49 (351) 463-37773
>>>> E-Mail: tobias.hilbrich_at_[hidden]
>>>> devel mailing list
>>> devel mailing list
>> devel mailing list
> Dipl.-Inf. Tobias Hilbrich
> Wissenschaftlicher Mitarbeiter
> Technische Universitaet Dresden
> Zentrum fuer Informationsdienste und Hochleistungsrechnen (ZIH)
> (Center for Information Services and High Performance Computing (ZIH))
> Interdisziplinäre Anwenderunterstützung und Koordination
> (Interdisciplinary Application Development and Coordination)
> 01062 Dresden
> Tel.: +49 (351) 463-32041
> Fax: +49 (351) 463-37773
> E-Mail: tobias.hilbrich_at_[hidden]
> devel mailing list