Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI one-sided passive synchronization.
From: Abhishek Kulkarni (abbyzcool_at_[hidden])
Date: 2011-04-13 16:11:40


On Wed, Apr 13, 2011 at 2:49 PM, Barrett, Brian W <bwbarre_at_[hidden]>wrote:

> This is mostly an issue of how MPICH2 and Open MPI implement lock/unlock.
> Some might call what I'm about to describe erroneous. I wrote the
> one-sided code in Open MPI and may be among those people.
>
> In both implementations, one-sided communication is not necessarily truly
> asynchronous. That is, the target of an operation may have to enter the
> MPI library (MPI_Wtime does not count as entering the library in this
> case) to progress Lock/Unlock calls. So rank 2 calls lock (which is a
> no-op in both implementations), calls put, calls unlock, and waits for a
> response. Ranks 0 and 1 wait for a second and enter lock, get, and
> unlock. At this point, data actually starts to move. Chances are, rank 0
> is going to process it's request first, hence the get from rank 0
> returning 0. Then rank 0 will perhaps process some other requests before
> it leaves unlock (perhaps not), and enter barrier. At this point, it will
> progress everything until the other ranks enter barrier, meaning rank 2's
> put and rank 2 and 3s get will finally be processed.
>
>
Brian,

Ok, that helps explain what's going on.

I understand the difficulty in implementing truly asynchronous RMA
especially
when the remote process does not yield to the progress engine periodically.
Although the standard is lacking and ambiguous on the details (ordering of
RMA calls,
behavior of Lock/Unlock) of passive synchronization, it gives a sense that
only the
origin process is explicitly involved in the transfer and passive target
communication
can further be used to emulate a shared memory model via MPI calls.

But given the existing behavior, all bets are off and it renders passive
synchronization
(MPI_Win_unlock) mostly similar to active synchronization (MPI_Win_fence).
In trying to emulate a distributed shared memory model, I was hoping to do
things
like:

MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win);
MPI_Get(&out, 1, MPI_INT, 0, 0, 1, MPI_INT, win);
if (out % 2 == 0)
     out++;
MPI_Accumulate(&out, 1, MPI_INT, 0, 0, 1, MPI_INT, MPI_REPLACE, win);
MPI_Win_unlock(0, win);

but it is impossible to implement such atomic sections given no semantic
guarantees
on ordering of the RMA calls.

Thanks,
Abhishek

> In case you're wondering, the specification wasn't disobeyed in the
> communication order; the lock description is very loose and is relative to
> other MPI events. So if you put the barrier before the lock/get/unlock,
> you'd get the answer you wanted because rank 2's lock would have to occur
> before rank 0's. With no other MPI synchronization, there's no
> requirement that be true, and the locking order could be 0, 1, 2, 2 if it
> really wanted to be (ie, it would be perfectly legal for rank 1 to also
> return 0).
>
> This is obviously not ideal, and one of the areas of focus for the MPI-3
> standardization effort. In Open MPI, adding true asynchronous behavior is
> difficult. The original design assumed that the lowest communication
> layers would be able to provide asynchronous completion events to progress
> the one-sided implementation. Thus far, only the authors of the TCP stack
> have provided such behavior and it's not as well tested as other modes of
> operation.
>
> Brian
>
> On 4/13/11 12:31 PM, "Abhishek Kulkarni" <abbyzcool_at_[hidden]> wrote:
>
> >Hello,
> >
> >I am trying to better understand the semantics of passive synchronization
> >in one-sided communication calls. Doesn't MPI_Win_unlock()
> >block to ensure that all the preceeding RMA calls in that epoch have been
> >synced?
> >
> >In that case, why is an undefined value returned when trying to load from
> >a local window? (see below)
> >
> > MPI_Alloc_mem(128, MPI_INFO_NULL, &ptr);
> > MPI_Win_create(ptr, 128, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win);
> >
> > // write to the target window of the head node
> > if (rank == (size - 1)) {
> > MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win);
> > in = 10;
> > MPI_Put(&in, 1, MPI_INT, 0, 0, 1, MPI_INT, win);
> >
> > MPI_Win_unlock(0, win);
> > } else {
> > // busy wait
> > start = MPI_Wtime();
> > end = MPI_Wtime();
> > while ((end - start) < 1)
> > end = MPI_Wtime();
> > }
> >
> > // read from the head node's window
> > MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win);
> > MPI_Get(&out, 1, MPI_INT, 0, 0, 1, MPI_INT, win);
> > MPI_Win_unlock(0, win);
> >
> > MPI_Barrier(MPI_COMM_WORLD);
> >
> > printf("R%d: %d\n", rank, out);
> >
> >The output of the above program with 1.5.3rc1 (and also with MPICH2
> >1.4rc2) is:
> >R2: 10
> >R1: 10
> >R0: 0
> >
> >whereas I expect to see:
> >R2: 10
> >R1: 10
> >R0: 10
> >
> >Thanks,
> >Abhishek
> >
> >_______________________________________________
> >users mailing list
> >users_at_[hidden]
> >http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Brian W. Barrett
> Dept. 1423: Scalable System Software
> Sandia National Laboratories
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>