Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Troubles using MPI_Isend/MPI_Irecv/MPI_Waitany and MPI_Allreduce
From: Pedro Gonnet (gonnet_at_[hidden])
Date: 2011-11-13 06:15:13


Sorry for the long delay on my behalf too.

Using MPI_Init_thread with MPI_THREAD_MULTIPLE fixes this problem!
Should have had a closer look at the documentation...

Cheers,
Pedro

> Sorry for the delay in replying.
> I think you need to use MPI_INIT_THREAD with a level of
> MPI_THREAD_MULTIPLE instead of MPI_INIT. This sets up internal locking
> in Open MPI to protect against multiple threads inside the progress
> engine, etc.
> Be aware that only some of Open MPI's transports are THREAD_MULTIPLE
> safe -- see the README for more detail.
> On Oct 23, 2011, at 1:11 PM, Pedro Gonnet wrote:
> >
> > Hi again,
> >
> > As promised, I implemented a small program reproducing the error.
> >
> > The program's main routine spawns a pthread which calls the
> function
> > "exchange". "exchange" uses MPI_Isend/MPI_Irecv/MPI_Waitany to
> exchange
> > a buffer of double-precision numbers with all other nodes.
> >
> > At the same time, the "main" routine exchanges the sum of all the
> > buffers using MPI_Allreduce.
> >
> > To compile and run the program, do the following:
> >
> > mpicc -g -Wall mpitest.c -pthread
> > mpirun -np 8 ./a.out
> >
> > Timing is, of course, of the essence and you may have to run the
> program
> > a few times or twiddle with the value of "usleep" in line 146 for it
> to
> > hang. To see where things go bad, you can do the following
> >
> > mpirun -np 8 xterm -e gdb -ex run ./a.out
> >
> > Things go bad when MPI_Allreduce is called while any of the threads
> are
> > in MPI_Waitany. The value of "usleep" in line 146 should be long
> enough
> > for all the nodes to have started exchanging data but small enough
> so
> > that they are not done yet.
> >
> > Cheers,
> > Pedro
> >
> >
> >
> > On Thu, 2011-10-20 at 11:25 +0100, Pedro Gonnet wrote:
> >> Short update:
> >>
> >> I just installed version 1.4.4 from source (compiled with
> >> --enable-mpi-threads), and the problem persists.
> >>
> >> I should also point out that if, in thread (ii), I wait for the
> >> nonblocking communication in thread (i) to finish, nothing bad
> happens.
> >> But this makes the nonblocking communication somewhat pointless.
> >>
> >> Cheers,
> >> Pedro
> >>
> >>
> >> On Thu, 2011-10-20 at 10:42 +0100, Pedro Gonnet wrote:
> >>> Hi all,
> >>>
> >>> I am currently working on a multi-threaded hybrid parallel
> simulation
> >>> which uses both pthreads and OpenMPI. The simulation uses several
> >>> pthreads per MPI node.
> >>>
> >>> My code uses the nonblocking routines
> MPI_Isend/MPI_Irecv/MPI_Waitany
> >>> quite successfully to implement the node-to-node communication.
> When I
> >>> try to interleave other computations during this communication,
> however,
> >>> bad things happen.
> >>>
> >>> I have two MPI nodes with two threads each: one thread (i) doing
> the
> >>> nonblocking communication and the other (ii) doing other
> computations.
> >>> At some point, the threads (ii) need to exchange data using
> >>> MPI_Allreduce, which fails if the first thread (i) has not
> completed all
> >>> the communication, i.e. if thread (i) is still in MPI_Waitany.
> >>>
> >>> Using the in-place MPI_Allreduce, I get a re-run of this bug:
> >>> http://www.open-mpi.org/community/lists/users/2011/09/17432.php.
> If I
> >>> don't use in-place, the call to MPI_Waitany (thread ii) on one of
> the
> >>> MPI nodes waits forever.
> >>>
> >>> My guess is that when the thread (ii) calls MPI_Allreduce, it
> gets
> >>> whatever the other node sent with MPI_Isend to thread (i), drops
> >>> whatever it should have been getting from the other node's
> >>> MPI_Allreduce, and the call to MPI_Waitall hangs.
> >>>
> >>> Is this a known issue? Is MPI_Allreduce not designed to work
> alongside
> >>> the nonblocking routines? Is there a "safe" variant of
> MPI_Allreduce I
> >>> should be using instead?
> >>>
> >>> I am using OpenMPI version 1.4.3 (version 1.4.3-1ubuntu3 of the
> package
> >>> openmpi-bin in Ubuntu). Both MPI nodes are run on the same
> dual-core
> >>> computer (Lenovo x201 laptop).
> >>>
> >>> If you need more information, please do let me know! I'll also try
> to
> >>> cook-up a small program reproducing this problem...
> >>>
> >>> Cheers and kind regards,
> >>> Pedro
> >>>
> >>>
> >>>
> >>>
> >>
> >
> > <mpitest.c>_______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>