Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-02-25 11:17:52


Note that this is correct MPI behavior -- the MPI standard does not
define whether MPI_SEND blocks or not. Indeed, codes that assume
that MPI_SEND blocks (or doesn't block) are technically not correct
MPI codes. The issue is that different networks (e.g., shared memory
vs. TCP) may have different transport characteristics, and the MPI
may need to block in some situations (especially for large messages).

But don't worry -- it's usually pretty easy to fix such issues in
applications. Check out the MPI-1 document, section 3.5, "Semantics
of point to point communication" -- in particular, example 3.9 (page
33).

For example, if you have code that tries to exchange messages but
relies on the MPI implementation to buffer sends (i.e., assumes that
MPI_SEND won't block), it may look something like this:

if (rank == 0 && rank == 1) {
     MPI_Send(..., 1-rank, tag, comm)
     MPI_Recv(..., 1-rank, tag, comm, &request)
}

This code can potentially deadlock if MPI_SEND decides to block.
It's easy enough to fix -- one way is to do something like this:

if (rank == 0) {
     MPI_Send(..., 1, tag, comm);
     MPI_Recv(..., 1, tag, comm, &status);
} else if (rank == 1) {
     MPI_Recv(..., 0, tag, comm, &status);
     MPI_Send(..., 0, tag, comm);
}

This ensures that the send from 0->1 completes before you try to send
1->0. If you want to get concurrency of both sends, then use non-
blocking primitives (e.g., MPI_Isend).

Good luck.

On Feb 22, 2006, at 10:07 AM, Cezary Sliwa wrote:

>
> My program runs fine with openmpi-1.0.1 when run from the command line
> (5 processes with empty host file), but when I schedule it with
> qsub to
> run on 2 nodes it blocks on MPI_SEND
>
> (gdb) info stack
> #0 0x00000034db30c441 in __libc_sigaction () from
> /lib64/tls/libpthread.so.0
> #1 0x0000000000573002 in opal_evsignal_recalc ()
> #2 0x0000000000582a3c in poll_dispatch ()
> #3 0x00000000005729f2 in opal_event_loop ()
> #4 0x0000000000577e68 in opal_progress ()
> #5 0x00000000004eed4a in mca_pml_ob1_send ()
> #6 0x000000000049abdd in PMPI_Send ()
> #7 0x0000000000499dc0 in pmpi_send__ ()
> #8 0x000000000042d5d8 in MAIN__ () at main.f:90
> #9 0x00000000005877de in main (argc=Variable "argc" is not available.
> )
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/