Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Collective operations and synchronization
From: Ashley Pittman (ashley_at_[hidden])
Date: 2009-03-24 06:10:57

On 23 Mar 2009, at 23:36, Shaun Jackman wrote:
> loop {
> MPI_Ibsend (for every edge of every leaf node)
> MPI_barrier
> MPI_Iprobe/MPI_Recv (until no messages pending)
> MPI_Allreduce (number of nodes removed)
> } until (no nodes removed by any node)
> Previously, I attempted to use a single MPI_Allreduce without the
> MPI_Barrier:

You need both the MPI_Barrier and the synchronisation semantics of the
MPI_Allreduce in this example, it's important that each send matches a
recv for the same iteration so you need to ensure all sends have been
sent before you call probe and a Barrier is one way of doing this. You
also need the syncronisation semantics of the Allreduce to ensure the
iProbe doesn't match a send from the next iteration of the loop.

Perhaps there is a better way of accomplishing the same thing however,
MPI_Barrier syncronises all processes so is potentially a lot more
heavyweight than it needs to be, in this example you only need to
syncronise with your neighbours so it might be quicker to use a
send/receive for each of your neighbours containing a true/false value
rather than to rely on the existence of a message or not. i.e. the
barrier is needed because you don't know how many messages there are,
it may well be quicker to have a fixed number of point to point
messages rather than a extra global synchronisation. The added
advantage of doing it this way would be you could remove the Probe as

Potentially it would be possible to remove the Allreduce as well and
use the tag to identify the iteration count, assuming of course you
don't need to know the global number of branches at any iteration. One
problem with this approach can be that one process can get very slow
and swamped with unexpected messages however assuming your neighbour
count is small this shouldn't be a problem. I'd expect their to not
only be a net gain changing to this way but for the application to
scale better as well.

Finally I've always favoured iRecv/Send over Ibsend/Recv as in the
majority of cases this tends to be faster, you'd have to benchmark your
specific setup however.