On Jun 4, 2009, at 3:53 AM, Lars Andersson wrote:
> In my second test, I simply put a sleep(3) at point 2), and expected
> the MPI_Wait() call at 3) to finish almost instantly, since I assumed
> that the message would have been transferred during the sleep. To my
> disappointment tough, it took more or less the same time to finish the
> MPI_Wait as without any sleep.
As you found by googling, and as Bogdan infers, Open MPI doesn't
currently make much progress over TCP-based networks "in the
background." And you're right that putting an MPI_WAIT in a progress
thread would cause that thread to spin heavily, effectively taking
much of your CPU cycles away from you, and possibly even having other
bad effects (e.g., cache thrashing, context switching, etc.).
I'd say that your own workaround here is to intersperse MPI_TEST's
periodically. This will trigger OMPI's pipelined protocol for large
messages, and should allow partial bursts of progress while you're
assumedly off doing useful work. If this is difficult because the
work is being done in library code that you can't change, then perhaps
a pre-spawned "work" through could be used to call MPI_TEST
periodically. That way, it won't steal huge ammounts of CPU cycles
(like MPI_WAIT would). You still might get some cache thrashing,
context switching, etc. -- YMMV.
As for exactly how many / how often you should call MPI_TEST, that is
going to be up to you. It's going to depend on a lot of factors -- how
big the message is, how well synchronized you are with the receiver,
what strategy you use to call MPI_TEST, etc.
Open MPI may someday treat this better to either have a blocking form
of MPI_WAIT (i.e., not spinning, or spinning considerably less) or
have true TCP progress in the background. But if I had to guess, I'd
say that we'll likely do the former before the latter.