Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Isend delay
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-07-14 21:37:22


On Jul 14, 2011, at 8:33 PM, dave fournier wrote:

> Sorry I should have said it doesn't get sent until the *master* encounters an MPI_recv.
> Then suddenly the slave finally gets the message and carries on its task.
>
> I know that the slave is waiting because:
> 1.) it doesn't print anything
> 2.) I have attached to it with gdb previously to monitor the behaviour.

Ah -- so you're saying that the master does something like this:

Time = A: Master calls MPI_Isend(msg, ..., &req);
Time = B: Master goes off and does other things
Time = C: Slave calls MPI_Recv(msg, ...);
Time = D: more time passes
Time = E: Master calls MPI_Recv(some_other_msg, ...);

And you're saying that the slave should be getting the message (more or less) instantly at Time=C, but instead gets it at Time=E, right?

If so, it's because Open MPI does not do background progress on non-blocking sends in all cases. Specifically, if you're sending over TCP and the message is "long", the OMPI layer in the master doesn't actually send the whole message immediately because it doesn't want to unexpectedly consume a lot of resources in the slave. So the master only sends a small fragment of the message and the communicator,tag tuple suitable for matching at the receiver. When the receiver posts a corresponding MPI_Recv (time=C), it sends back an ACK to the master, enabling the master to send the rest of the message.

However, since OMPI doesn't support background progress in all situations, the master doesn't see this ACK until it goes into the MPI progression engine -- i.e., when you call MPI_Recv() at Time=E. Then the OMPI layer in the master sees the ACK and sends the rest of the message.

Make sense?

You can make quick dips into the OMPI progression engine by calling MPI_Test() on the request that you got back from MPI_Isend() -- e.g., you can do this at Time=B,C,D. This is not as intrusive as calling MPI_Recv(), and may allow your message to be transferred earlier.

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/