Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Troubles using MPI_Isend/MPI_Irecv/MPI_Waitany and MPI_Allreduce
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-11-13 10:34:39


Glad it worked!

On Nov 13, 2011, at 6:15 AM, Pedro Gonnet wrote:

>
> Sorry for the long delay on my behalf too.
>
> Using MPI_Init_thread with MPI_THREAD_MULTIPLE fixes this problem!
> Should have had a closer look at the documentation...
>
> Cheers,
> Pedro
>
>
>
>> Sorry for the delay in replying.
>> I think you need to use MPI_INIT_THREAD with a level of
>> MPI_THREAD_MULTIPLE instead of MPI_INIT. This sets up internal locking
>> in Open MPI to protect against multiple threads inside the progress
>> engine, etc.
>> Be aware that only some of Open MPI's transports are THREAD_MULTIPLE
>> safe -- see the README for more detail.
>> On Oct 23, 2011, at 1:11 PM, Pedro Gonnet wrote:
>>>
>>> Hi again,
>>>
>>> As promised, I implemented a small program reproducing the error.
>>>
>>> The program's main routine spawns a pthread which calls the
>> function
>>> "exchange". "exchange" uses MPI_Isend/MPI_Irecv/MPI_Waitany to
>> exchange
>>> a buffer of double-precision numbers with all other nodes.
>>>
>>> At the same time, the "main" routine exchanges the sum of all the
>>> buffers using MPI_Allreduce.
>>>
>>> To compile and run the program, do the following:
>>>
>>> mpicc -g -Wall mpitest.c -pthread
>>> mpirun -np 8 ./a.out
>>>
>>> Timing is, of course, of the essence and you may have to run the
>> program
>>> a few times or twiddle with the value of "usleep" in line 146 for it
>> to
>>> hang. To see where things go bad, you can do the following
>>>
>>> mpirun -np 8 xterm -e gdb -ex run ./a.out
>>>
>>> Things go bad when MPI_Allreduce is called while any of the threads
>> are
>>> in MPI_Waitany. The value of "usleep" in line 146 should be long
>> enough
>>> for all the nodes to have started exchanging data but small enough
>> so
>>> that they are not done yet.
>>>
>>> Cheers,
>>> Pedro
>>>
>>>
>>>
>>> On Thu, 2011-10-20 at 11:25 +0100, Pedro Gonnet wrote:
>>>> Short update:
>>>>
>>>> I just installed version 1.4.4 from source (compiled with
>>>> --enable-mpi-threads), and the problem persists.
>>>>
>>>> I should also point out that if, in thread (ii), I wait for the
>>>> nonblocking communication in thread (i) to finish, nothing bad
>> happens.
>>>> But this makes the nonblocking communication somewhat pointless.
>>>>
>>>> Cheers,
>>>> Pedro
>>>>
>>>>
>>>> On Thu, 2011-10-20 at 10:42 +0100, Pedro Gonnet wrote:
>>>>> Hi all,
>>>>>
>>>>> I am currently working on a multi-threaded hybrid parallel
>> simulation
>>>>> which uses both pthreads and OpenMPI. The simulation uses several
>>>>> pthreads per MPI node.
>>>>>
>>>>> My code uses the nonblocking routines
>> MPI_Isend/MPI_Irecv/MPI_Waitany
>>>>> quite successfully to implement the node-to-node communication.
>> When I
>>>>> try to interleave other computations during this communication,
>> however,
>>>>> bad things happen.
>>>>>
>>>>> I have two MPI nodes with two threads each: one thread (i) doing
>> the
>>>>> nonblocking communication and the other (ii) doing other
>> computations.
>>>>> At some point, the threads (ii) need to exchange data using
>>>>> MPI_Allreduce, which fails if the first thread (i) has not
>> completed all
>>>>> the communication, i.e. if thread (i) is still in MPI_Waitany.
>>>>>
>>>>> Using the in-place MPI_Allreduce, I get a re-run of this bug:
>>>>> http://www.open-mpi.org/community/lists/users/2011/09/17432.php.
>> If I
>>>>> don't use in-place, the call to MPI_Waitany (thread ii) on one of
>> the
>>>>> MPI nodes waits forever.
>>>>>
>>>>> My guess is that when the thread (ii) calls MPI_Allreduce, it
>> gets
>>>>> whatever the other node sent with MPI_Isend to thread (i), drops
>>>>> whatever it should have been getting from the other node's
>>>>> MPI_Allreduce, and the call to MPI_Waitall hangs.
>>>>>
>>>>> Is this a known issue? Is MPI_Allreduce not designed to work
>> alongside
>>>>> the nonblocking routines? Is there a "safe" variant of
>> MPI_Allreduce I
>>>>> should be using instead?
>>>>>
>>>>> I am using OpenMPI version 1.4.3 (version 1.4.3-1ubuntu3 of the
>> package
>>>>> openmpi-bin in Ubuntu). Both MPI nodes are run on the same
>> dual-core
>>>>> computer (Lenovo x201 laptop).
>>>>>
>>>>> If you need more information, please do let me know! I'll also try
>> to
>>>>> cook-up a small program reproducing this problem...
>>>>>
>>>>> Cheers and kind regards,
>>>>> Pedro
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> <mpitest.c>_______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/