Thanks for your detailed info. In my case, I expect to spawn multiple
threads from each MPI process. I could use MPI_THREAD_FUNNELED
or MPI_THREAD_SERIALIZED to do so - I think MPI_THREAD_MULTIPLE is not
supported on InfiniBand, which I am using. Currently, I use OpenMPI +
Boost::Thread - no plan to shift to Boost::MPI yet.
I still have a couple of questions to ask:
1. In both MPI_THREAD_FUNNELED and MPI_THREAD_SERIALIZED modes, the MPI
calls are serialized at only one thread (in the former case, only the rank
main thread can make MPI calls, while in the latter case the threads need
to be coordinated so that only one thread makes MPI calls at a time). So
are there any performance implications associated with choosing between
FUNNELED or SERIALIZED?
2. My current code uses many MPI collective calls
(gather/scatter/broadcast, etc.). It seems that these collective calls have
some negative impact on performance because ALL MPI processes need to wait
on each of these calls. I would like to explore the idea of decoupling
computation from MPI communication - so if one thread of each MPI rank is
blocked at a MPI call, the other threads can still make progress. I am
wondering if I could still make MPI calls from the other non-blocked
threads using MPI_THREAD_FUNNELED or MPI_THREAD_SERIALIZED mode (assuming
that the blocked thread is the main thread in the rank)?
Any advice is highly appreciated!
On Tue, Apr 23, 2013 at 12:46 PM, Nick Edmonds <ngedmond_at_[hidden]>wrote:
> Hi Jacky,
> I'm a regular reader of this list but seldom a poster. In this case
> however I might actually be qualified to answer some questions or provide
> some insight given I'm not sure how many other folks here use Boost.Thread.
> The first question is really what sort of threading model you want to use
> with MPI, which others here are probably more qualified to advise you on.
> In our applications we're using Boost.Thread with MPI_THREAD_MULTIPLE,
> which is a not all-together enjoyable experience because the openib BTL
> lacks support for thread multiple (at least as of the last time I checked).
> That being said, Boost.Thread behaves just like any pthread code on the
> linux clusters we run on, as well as one BlueGene/P. With
> MPI_THREAD_SERIALIZED writing hybrid-parallel code is pretty painless.
> Most of the work required involved adding two-stage collectives such that
> threads first perform collectives locally and then a single thread
> participates in the MPI collective operation.
> If you end up using Boost.MPI you could probably even write your own
> wrappers to encapsulate the local computation required for MPI collective
> operations. Unfortunately Boost.MPI currently lacks full support for even
> MPI-2 but if it includes the subset of functionality you need it may be
> worthwhile. Extensions are fairly straightforward to implement as well.
> I've implemented a few different approaches to MPI + threading in the
> context of Boost, from explicit thread management to thread pools, and
> currently a complete runtime system. Most of it is research code, though
> there's no reason it couldn't be released, and some of it probably will be
> eventually. If you'd like to describe your intended use case I'm happy to
> offer any advice I can based on what I've learned.
> On Apr 22, 2013, at 3:25 PM, Thomas Watson wrote:
> > Hi,
> > I would like to create a pool of threads (using Boost::Thread) within
> each OpenMPI process to accelerate my application on multicore CPUs. My
> application is already built on OpenMPI, but it currently exploits
> parallelism only at the process level.
> > I am wondering if anyone can point me to some good
> tutorials/documents/examples on how to integrate Boost multithreading with
> OpenMPI applications?
> > Thanks!
> > Jacky
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> users mailing list