Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Failure to make progress
From: George Bosilca (bosilca_at_[hidden])
Date: 2009-02-23 16:54:47


Ken,

Your interpretation of the MPI standard is way too optimistic.
Unfortunately, there is no asynchronous progress (expect on very few
devices) in most of the MPI libraries. So, you should not expect the
non blocking send to complete, without going in some MPI calls
(MPI_Test as an example).

Moreover, your example is not really correct. While the MPI standard
clearly state that MPI_Finalize is a collective over all connected
processes, it also state that

"each process must ensure that all pending non-blocking communications
are (locally) complete before calling MPI_FINALIZE. Further, at the
instant at which the last process calls MPI_FINALIZE, all pending
sends must be matched by a receive, and all pending receives must be
matched by a send.

A successful return from a blocking communication operation or from
MPI_WAIT or MPI_TEST tells the user that the buffer can be reused and
means that the communication is completed by the user, but does not
guarantee that the local process has no more work to do. A successful
return from MPI_REQUEST_FREE with a request handle generated by an
MPI_ISEND nullifies the handle but provides no assurance of operation
completion. The MPI_ISEND is complete only when it is known by some
means that a matching receive has completed. MPI_FINALIZE guarantees
that all local actions required by communications the user has
completed will, in fact, occur before it returns.

MPI_FINALIZE guarantees nothing about pending communications that have
not been completed (completion is assured only by MPI_WAIT, MPI_TEST,
or MPI_REQUEST_FREE combined with some other verification of
completion)."

   george.

On Feb 23, 2009, at 16:24 , Ken Olum wrote:

> I'm running OpenMPI 1.2.6 under Red Hat Enterprise Linux Server
> release 5.2 on an x86_64 cluster.
>
> When I send a message with MPI_Isend I think it should eventually be
> delivered (if I have a receive waiting), without my having to make any
> other MPI calls. This appears to be guaranteed by the spec. In MPI
> version 1.1, section 3.7.4, Semantics of Nonblocking Communications,
> it says
>
> Progress A call to MPI_WAIT that completes a receive will
> eventually
> terminate and return if a matching send has been started, unless
> the
> send is satisfied by another receive. In particular, if the
> matching
> send is nonblocking, then the receive should complete even if no
> call
> is executed by the sender to complete the send.
>
> This appears never to work when my two processes are on different
> nodes. I enclose a test case below. In this simple case, I can work
> around the problem by waiting for the send to complete, but in general
> after a bunch of communication I don't know any way that I can make
> sure that all my sent messages have actually been sent, without
> blocking.
>
> In the following code, rank 0 sends a message to rank 7, sleeps for 5
> seconds, and then calls MPI_Finalize. The output below shows that
> rank 7 doesn't receive the message until finalize is called.
> (Ranks 1-6 exist only to get the scheduler here to dispatch 0 and 7
> to different nodes.)
>
> Ken Olum
>
> ----------------------------------------------------------------------
> #include <stdlib.h>
> #include <stdio.h>
> #include <string.h>
> #include <time.h>
> #include "mpi.h"
>
> char *timestamp()
> { time_t now;
> struct tm *data;
> char *result;
>
> time(&now);
> data = localtime(&now);
>
> result = malloc(9);
> sprintf(result, "%2d:%02d:%02d", data->tm_hour, data->tm_min, data-
> >tm_sec);
> return result;
>
> }
>
> main( argc, argv )
> int argc;
> char **argv;
> {
> char message[20];
> int myrank, mysize, flag, i;
> MPI_Status status;
> MPI_Request request;
>
> MPI_Init(&argc, &argv);
> MPI_Comm_size(MPI_COMM_WORLD, &mysize);
> MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
> printf("Proc %d, %s: initialized\n", myrank, timestamp());
> if (myrank == 0) /* code for process zero */
> {
> strcpy(message,"TEST");
> printf("Proc %d, %s: sending '%s'\n", myrank, timestamp(),
> message);
> MPI_Isend(message, strlen(message)+1, MPI_CHAR, mysize-1, 99,
> MPI_COMM_WORLD, &request);
> printf("Proc %d, %s: sent\n", myrank, timestamp());
> }
> else if (myrank == mysize-1) /* code for last
> process */
> {
> printf("Proc %d, %s: receiving\n", myrank, timestamp());
> MPI_Irecv(message, 20, MPI_CHAR, 0, 99, MPI_COMM_WORLD,
> &request); /* Start it */
> MPI_Wait(&request, &status); /* Wait for it */
> printf("Proc %d, %s: received '%s'\n", myrank, timestamp(),
> message);
> }
> printf("Proc %d, %s: sleeping\n", myrank, timestamp());
> sleep(5);
> printf("Proc %d, %s: finalizing\n", myrank, timestamp());
> MPI_Finalize();
> }
> ----------------------------------------------------------------------
>
> Proc 2, 16:16:19: initialized
> Proc 2, 16:16:19: sleeping
> Proc 3, 16:16:19: initialized
> Proc 3, 16:16:19: sleeping
> Proc 0, 16:16:19: initialized
> Proc 0, 16:16:19: sending 'TEST'
> Proc 0, 16:16:19: sent
> Proc 0, 16:16:19: sleeping
> Proc 5, 16:16:19: initialized
> Proc 5, 16:16:19: sleeping
> Proc 6, 16:16:19: initialized
> Proc 6, 16:16:19: sleeping
> Proc 7, 16:16:19: initialized
> Proc 7, 16:16:19: receiving
> Proc 1, 16:16:19: initialized
> Proc 1, 16:16:19: sleeping
> Proc 4, 16:16:19: initialized
> Proc 4, 16:16:19: sleeping
> Proc 3, 16:16:24: finalizing
> Proc 5, 16:16:24: finalizing
> Proc 0, 16:16:24: finalizing
> Proc 2, 16:16:24: finalizing
> Proc 6, 16:16:24: finalizing
> Proc 1, 16:16:24: finalizing
> Proc 4, 16:16:24: finalizing
> Proc 7, 16:16:24: received 'TEST'
> Proc 7, 16:16:24: sleeping
> Proc 7, 16:16:29: finalizing
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel