Eugene Loh wrote:
> Shaun Jackman wrote:
>> Eugene Loh wrote:
>>> Shaun Jackman wrote:
>>>> For my MPI application, each process reads a file and for each line
>>>> sends a message (MPI_Send) to one of the other processes determined
>>>> by the contents of that line. Each process posts a single MPI_Irecv
>>>> and uses MPI_Request_get_status to test for a received message. If a
>>>> message has been received, it processes the message and posts a new
>>>> MPI_Irecv. I believe this situation is not safe and prone to
>>>> deadlock since MPI_Send may block. The receiver would need to post
>>>> as many MPI_Irecv as messages it expects to receive, but it does not
>>>> know in advance how many messages to expect from the other
>>>> processes. How is this situation usually handled in an MPI
>>>> appliation where the number of messages to receive is unknown?
>>> Each process posts an MPI_Irecv to listen for in-coming messages.
>>> Each process enters a loop in which it reads its file and sends out
>>> messages. Within this loop, you also loop on MPI_Test to see if any
>>> message has arrived. If so, process it, post another MPI_Irecv(),
>>> and keep polling. (I'd use MPI_Test rather than
>>> MPI_Request_get_status since you'll have to call something like
>>> MPI_Test anyhow to complete the receive.)
>>> Once you've posted all your sends, send out a special message to
>>> indicate you're finished. I'm thinking of some sort of tree
>>> fan-in/fan-out barrier so that everyone will know when everyone is
>>> Keep polling on MPI_Test, processing further receives or advancing
>>> your fan-in/fan-out barrier.
>>> So, the key ingredients are:
>>> *) keep polling on MPI_Test and reposting MPI_Irecv calls to drain
>>> in-coming messages while you're still in your "send" phase
>>> *) have another mechanism for processes to notify one another when
>>> they've finished their send phases
>> Hi Eugene,
>> Very astute. You've pretty much exactly described how it works now,
>> particularly the loop around MPI_Test and MPI_Irecv to drain incoming
>> messages. So, here's my worry, which I'll demonstrate with an example.
>> We have four processes. Each calls MPI_Irecv once. Each reads one line
>> of its file. Each sends one message with MPI_Send to some other
>> process based on the line that it has read, and then goes into the
>> MPI_Test/MPI_Irecv loop.
>> The events fall out in this order
>> 2 sends to 0 and does not block (0 has one MPI_Irecv posted)
>> 3 sends to 1 and does not block (1 has one MPI_Irecv posted)
>> 0 receives the message from 2, consuming its MPI_Irecv
>> 1 receives the message from 3, consuming its MPI_Irecv
>> 0 sends to 1 and blocks (1 has no more MPI_Irecv posted)
>> 1 sends to 0 and blocks (0 has no more MPI_Irecv posted)
>> and now processes 0 and 1 are deadlocked.
>> When I say `receives' above, I mean that Open MPI has received the
>> message and copied it into the buffer passed to the MPI_Irecv call,
>> but the application hasn't yet called MPI_Test. The next step would be
>> for all the processes to call MPI_Test, but 0 and 1 are already
> I don't get it. Processes should drain aggressively. So, if 0 receives
> a message, it should immediately post the next MPI_Irecv. Before 0
> posts a send, it should MPI_Test (and post the next MPI_Irecv if the
> test received a message).
> Further, you could convert to MPI_Isend.
> But maybe I'm missing something.
Before posting a send, the process can call MPI_Test to check for a
received packet, but there's a race condition here. The packet can
arrive between the MPI_Test (which returns false) and before it calls
MPI_Send. I've added the MPI_Test calls to my example scenario:
2 calls MPI_Test. No message is waiting, so 2 decides to send.
2 sends to 0 and does not block (0 has one MPI_Irecv posted)
3 calls MPI_Test. No message is waiting, so 3 decides to send.
3 sends to 1 and does not block (1 has one MPI_Irecv posted)
0 calls MPI_Test. No message is waiting, so 0 decides to send.
0 receives the message from 2, consuming its MPI_Irecv
1 calls MPI_Test. No message is waiting, so 1 decides to send.
1 receives the message from 3, consuming its MPI_Irecv
0 sends to 1 and blocks (1 has no more MPI_Irecv posted)
1 sends to 0 and blocks (0 has no more MPI_Irecv posted)
and now processes 0 and 1 are deadlocked.