Patrik Jonsson wrote:
> Hi all,
> I'm seeing performance issues I don't understand in my multithreaded
> MPI code, and I was hoping someone could shed some light on this.
> The code structure is as follows: A computational domain is decomposed
> into MPI tasks. Each MPI task has a "master thread" that receives
> messages from the other tasks and puts those into a local, concurrent
> queue. The tasks then have a few "worker threads" that processes the
> incoming messages and when necessary sends them to other tasks. So for
> each task, there is one thread doing receives and N (typically number
> of cores-1) threads doing sends. All messages are nonblocking, so the
> workers just post the sends and continue with computation, and the
> master repeatedly does a number of test calls to check for incoming
> messages (there are different flavors of these messages so it does
> several tests).
When do you do the MPI_Test on the Isends? I have had performance issues in a
number of systems if I would use a single queue of MPI_Requests that would keep
Isends to different ranks and testing them one by one. It appears that some
messages are sent out more efficiently if you test them.
I found that either using MPI_Testsome or having a map(key=rank, value=queue of
MPI_Requests) and testing for each key the first MPI_Request, resolved this issue.