Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Collective operations and synchronization
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-03-25 18:25:14


Shaun Jackman wrote:

>> On Tue, 2009-03-24 at 07:03 -0800, Eugene Loh wrote:
>>
>>> I'm not sure I understand this suggestion, so I'll say it the way I
>>> understand it. Would it be possible for each process to send an
>>> "all done" message to each of its neighbors? Conversely, each
>>> process would poll its neighbors for messages, either processing
>>> graph operations or collecting "all done" messages depending on
>>> whether the message indicates a graph operation or signals "all done".
>>
> Ashley Pittman wrote:
>
>> Exactly, that way you have a defined number of messages which can be
>> calculated locally for each process and hence there is no need to use
>> Probe and you can get rid of the MPI_Barrier call.
>
> Hi Eugene,
>
> By `poll its neighbours', do you mean posting an MPI_Irecv for each
> neighbour, and working until an `all done' message (sent using
> MPI_Send) has been received from each neighbour?

Yes.

> As long as each process posts its MPI_Irecv before starting the
> MPI_Send, are we guaranteed that two processes won't deadlock by
> MPI_Send-ing to each other?

Yes, I think so.

> I avoided this method at first because I didn't understand that the
> MPI_Irecv would stick around regardless of a message being ready to
> receive; I figured that it had no effect if no message was ready to
> receive.

Not sure I understand. Let's say you post an MPI_Irecv and you get
something in a follow-up MPI_Test or MPI_Wait... but it's not the "all
done" signal. Rather, it's a graph operation. So, you perform that
graph operation and then post the next MPI_Irecv. Something like

MPI_Irecv()
while () {
    MPI_Wait()
    if ( all_done ) break;
    MPI_Irecv()
}

> In this implementation, the graph is partitioned arbitrarily; the
> vertices are distributed based on a hash function of each vertex's
> unique ID, not based on the topology of the graph (which would be
> nice, but difficult). So, every process is a neighbour of every other
> process.

Okay.

I guess one other piece of advice is this. Start with something that
works; make sure it is easy to reason about its correctness. Doesn't
matter if there is excessive synchronization. Then, start running. If
oversynchronization proves to be a bottleneck, then fix it. But don't
fix it until you have data that indicates that it's a problem. I'm sure
the folks on this list can come up with all sorts of great, minimally
synchronizing algorithms, but maybe you can get by with schemes that are
simpler, more robust, etc.