Subject: Re: [OMPI users] MPI one-sided passive synchronization.
From: Barrett, Brian W (bwbarre_at_[hidden])
Date: 2011-04-13 14:49:14

This is mostly an issue of how MPICH2 and Open MPI implement lock/unlock.
Some might call what I'm about to describe erroneous. I wrote the
one-sided code in Open MPI and may be among those people.

In both implementations, one-sided communication is not necessarily truly
asynchronous. That is, the target of an operation may have to enter the
MPI library (MPI_Wtime does not count as entering the library in this
case) to progress Lock/Unlock calls. So rank 2 calls lock (which is a
no-op in both implementations), calls put, calls unlock, and waits for a
response. Ranks 0 and 1 wait for a second and enter lock, get, and
unlock. At this point, data actually starts to move. Chances are, rank 0
is going to process it's request first, hence the get from rank 0
returning 0. Then rank 0 will perhaps process some other requests before
it leaves unlock (perhaps not), and enter barrier. At this point, it will
progress everything until the other ranks enter barrier, meaning rank 2's
put and rank 2 and 3s get will finally be processed.

In case you're wondering, the specification wasn't disobeyed in the
communication order; the lock description is very loose and is relative to
other MPI events. So if you put the barrier before the lock/get/unlock,
you'd get the answer you wanted because rank 2's lock would have to occur
before rank 0's. With no other MPI synchronization, there's no
requirement that be true, and the locking order could be 0, 1, 2, 2 if it
really wanted to be (ie, it would be perfectly legal for rank 1 to also
return 0).

This is obviously not ideal, and one of the areas of focus for the MPI-3
standardization effort. In Open MPI, adding true asynchronous behavior is
difficult. The original design assumed that the lowest communication
layers would be able to provide asynchronous completion events to progress
the one-sided implementation. Thus far, only the authors of the TCP stack
have provided such behavior and it's not as well tested as other modes of


On 4/13/11 12:31 PM, "Abhishek Kulkarni" <abbyzcool_at_[hidden]> wrote:

>I am trying to better understand the semantics of passive synchronization
>in one-sided communication calls. Doesn't MPI_Win_unlock()
>block to ensure that all the preceeding RMA calls in that epoch have been
>In that case, why is an undefined value returned when trying to load from
>a local window? (see below)
> MPI_Alloc_mem(128, MPI_INFO_NULL, &ptr);
> MPI_Win_create(ptr, 128, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win);
> // write to the target window of the head node
> if (rank == (size - 1)) {
> MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win);
> in = 10;
> MPI_Put(&in, 1, MPI_INT, 0, 0, 1, MPI_INT, win);
> MPI_Win_unlock(0, win);
> } else {
> // busy wait
> start = MPI_Wtime();
> end = MPI_Wtime();
> while ((end - start) < 1)
> end = MPI_Wtime();
> }
> // read from the head node's window
> MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win);
> MPI_Get(&out, 1, MPI_INT, 0, 0, 1, MPI_INT, win);
> MPI_Win_unlock(0, win);
> printf("R%d: %d\n", rank, out);
>The output of the above program with 1.5.3rc1 (and also with MPICH2
>1.4rc2) is:
>R2: 10
>R1: 10
>R0: 0
>whereas I expect to see:
>R2: 10
>R1: 10
>R0: 10
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories