Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Asymmetric performance with nonblocking, multithreaded communications
From: Patrik Jonsson (code_at_[hidden])
Date: 2011-12-09 10:34:10


Hi Yiannis,

On Fri, Dec 9, 2011 at 10:21 AM, Yiannis Papadopoulos
<giannis.papadopoulos_at_[hidden]> wrote:
> Patrik Jonsson wrote:
>>
>> Hi all,
>>
>> I'm seeing performance issues I don't understand in my multithreaded
>> MPI code, and I was hoping someone could shed some light on this.
>>
>> The code structure is as follows: A computational domain is decomposed
>> into MPI tasks. Each MPI task has a "master thread" that receives
>> messages from the other tasks and puts those into a local, concurrent
>> queue. The tasks then have a few "worker threads" that processes the
>> incoming messages and when necessary sends them to other tasks. So for
>> each task, there is one thread doing receives and N (typically number
>> of cores-1) threads doing sends. All messages are nonblocking, so the
>> workers just post the sends and continue with computation, and the
>> master repeatedly does a number of test calls to check for incoming
>> messages (there are different flavors of these messages so it does
>> several tests).
>
> When do you do the MPI_Test on the Isends? I have had performance issues in
> a number of systems if I would use a single queue of MPI_Requests that would
> keep Isends to different ranks and testing them one by one. It appears that
> some messages are sent out more efficiently if you test them.

There are 3 classes of messages that may arrive. The requests for each
are in a vector, and I use boost::mpi::test_some (which I assume just
calls MPI_Testsome) to test them in a round-robin fashion.

>
> I found that either using MPI_Testsome or having a map(key=rank, value=queue
> of MPI_Requests) and testing for each key the first MPI_Request, resolved
> this issue.

In my case, I know that the overwhelming traffic volume is one kind of
message. What I ended up doing was to simply repeat the test for that
message immediately if the preceding test succeeded, up to 1000 times,
before again checking the other requests. This appears to enable the
task to keep up with the incoming traffic.

I guess another possibility would be to have several slots for the
incoming messages. Right now I only post one irecv per source task. By
posting a couple, more messages would be able to come in without not
having a matching recv, and one test could match more of them. Since
that makes the logic more complicated, I didn't try that.