On Apr 24, 2013, at 10:24 AM, Thomas Watson <exascale.system_at_[hidden]> wrote:
> I still have a couple of questions to ask:
> 1. In both MPI_THREAD_FUNNELED and MPI_THREAD_SERIALIZED modes, the MPI calls are serialized at only one thread (in the former case, only the rank main thread can make MPI calls, while in the latter case the threads need to be coordinated so that only one thread makes MPI calls at a time). So are there any performance implications associated with choosing between FUNNELED or SERIALIZED?
In Open MPI, no.
> 2. My current code uses many MPI collective calls (gather/scatter/broadcast, etc.). It seems that these collective calls have some negative impact on performance because ALL MPI processes need to wait on each of these calls. I would like to explore the idea of decoupling computation from MPI communication - so if one thread of each MPI rank is blocked at a MPI call, the other threads can still make progress. I am wondering if I could still make MPI calls from the other non-blocked threads using MPI_THREAD_FUNNELED or MPI_THREAD_SERIALIZED mode (assuming that the blocked thread is the main thread in the rank)?
MPI-3 introduced the concept of non-blocking collectives (e.g., MPI_Igather). Open MPI 1.7.x has preliminary versions of these, but the implementations concentrated on correctness: they haven't been optimized yet. You might need to check how well MPI_Gather performs in a separate thread vs. MPI_Igather.
Also, be aware that not all collectives are synchronizing. Depending on the back-end algorithm that is used to implement any given collective, one MPI process may return much earlier from a collective call than one of its peers in the same collective call. For example, with MPI_Gather of a short message, all non-root processes might do an eager send and return more-or-less immediately. The root will need to block, however, until all messages are received.
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/