On Jan 23, 2009, at 2:36 PM, Hartzman, Leslie D (MS) wrote:
> Im trying to modify some code that is involved in point-to-point
> communications. Process A has a one way mode of communication with
> Process B. A checks to see if its rank is zero and if so will send
> a command to B (MPI_Issend) about what kind of data is going to
> be coming next. After sending the command to B, A then issues an
> Issend, sending a block of data to B.
> Process B sets up a number of request instances via MPI_Recv_init
> and then issues an MPI_Startall on the requests. B sits in a
> while (1) loop, where the basic processing is a switch statement
> based on the content of the command being sent from A. At the top
> of the loop, B sits at an MPI_Wait until a command comes in. Then
> at each case in the switch, B sits in a MPI_Waitall to make sure
> that all As have sent their data. B then processes the received
> data, issues an MPI_Startall on the receive requests instances,
> exits the switch statement and then issues an MPI_Start on the B
> command request so it can go back to waiting at the top of the loop.
> In the original process A code, prior to sending out a command,
> A will issue an MPI_Wait to make sure that the command request
> instance is free.
I'm not quite sure I understand that statement. Can't you just
compare the request to MPI_REQUEST_NULL? From your description, it
sounds like if you get to this point and the request is not
REQUEST_NULL, there's something else wrong. However, this may simply
be a side-effect from the short description of complex code...?
> After which it sends out the command, followed by the data. So Ive
> taken this infrastructure and have tried to add a new command from
> within a function called in A. The function is passed the command
> request instance to be used for the MPI_Wait. I check the status of
> this MPI_Wait, and all is good. I then issue my own MPI_Issend (have
> also tried MPI_Ssend) to process B. The status coming back from
> the send is good. At the end of this function I added in another
> MPI_Wait because this function sends several commands from within a
> loop. None of the commands are received by B at least not at the
> beginning. After process A goes through an outer loop a few times
> (each time calling my new function with the MPI calls in it),
> process B suddenly gets some of the commands for one pass through
> the function. After that it never comes back from the MPI_Wait at
> the end of the inner function.
It's pretty hard to say without looking at your code.
But one warning is that depending on your network type, progress on
MPI message passing may not occur unless you are in MPI function
calls. So if you MPI_Isend (or MPI_Issend or any other non-blocking
call), the message may or may not go out at that instant (or perhaps
only the first part of it goes out at that instant). It may require
another call into OMPI's progression engine to continue sending the
message. Hence, on the receiver, it may not look like messages have
arrived, but only because they haven't *fully* arrived yet (because
the sender hasn't finished sending them yet).
That being said, I assume that your A process will block in an
MPI_WAITANY, or somesuch, waiting for replies from the B process(es).
Blocking in MPI_WAIT* will trip OMPI's progression engine such that
whatever sends/receives are pending will get progressed as they can.
One clarifying question: why are you using synchronous sends?