>> I've been trying to get overlapping computation and data transfer to
>> work, without much success so far.
> If this is so important to you, why do you insist in using Ethernet
> and not a more HPC-oriented interconnect which can make progress in
> the background ?
We have a medium sized cluster connected using ethernet that works
pretty well for most of our workloads, and we don't have the resources
to simply buy whatever hardware would be more optimal.
For most parts of our application, we either have huge data transfers
that can't benefit much from simultaneous computation/overlap, or
small, frequent message passing that works well with the busy-waiting
nature or Open MPI.
However, we are now investigating a problem that would benefit from
(or at least be much easier to implement) if we were able to overlap
local computation with medium sized message transfers (1-10MB). In
short, the problem is having a master decoding image frames and
sending them around to a number of processing slaves, as well as
collecting resulting output for each frame from the slaves.
Since my first post, I've been searching a bit more and found the
"--enable-progress-threads" Open MPI build option. I've tried it
(using Open MPI 1.3.2), but it doesn't seem to make any difference.
So, what is my best bet?
1) Spawning a thread doing MPI_Wait() while doing the local work in
the main thread.
2) Spawning a thread doing something like
for each request
What amount of sleep would you recommend here?
3) Trying to intersperse my local computation with MPI_Test() calls?
I don't really like solution 3 because most of the local work is being
done in external library code, which means it's going to be hard/ugly
to intersperse it with MPI calls.
I'd really appreciate if someone with experience could comment on
this. I hope my problem is clear. How would you solve it?