Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI process hangs if OpenMPI is compiled with --enable-thread-multiple
From: Dominique Orban (dominique.orban_at_[hidden])
Date: 2013-11-25 22:52:46


On 2013-11-25, at 9:02 PM, Ralph Castain <rhc.openmpi_at_[hidden]> wrote:

> On Nov 25, 2013, at 5:04 PM, Pierre Jolivet <jolivet_at_[hidden]> wrote:
>
>>
>> On Nov 24, 2013, at 3:03 PM, Jed Brown <jedbrown_at_[hidden]> wrote:
>>
>>> Ralph Castain <rhc_at_[hidden]> writes:
>>>
>>>> Given that we have no idea what Homebrew uses, I don't know how we
>>>> could clarify/respond.
>>>
>>
>> Ralph, it is pretty easy to know what Homebrew uses, c.f. https://github.com/mxcl/homebrew/blob/master/Library/Formula/open-mpi.rb (sorry if you meant something else).
>
> Might be a surprise, but I don't track all these guys :-)
>
> Homebrew is new to me
>
>>
>>> Pierre provided a link to MacPorts saying that all of the following
>>> options were needed to properly enable threads.
>>>
>>> --enable-event-thread-support --enable-opal-multi-threads --enable-orte-progress-threads --enable-mpi-thread-multiple
>>>
>>> If that is indeed the case, and if passing some subset of these options
>>> results in deadlock, it's not exactly user-friendly.
>>>
>>> Maybe --enable-mpi-thread-multiple is enough, in which case MacPorts is
>>> doing something needlessly complicated and Pierre's link was a red
>>> herring?
>>
>> That is very likely, though on the other hand, Homebrew is doing something pretty straightforward. I just wanted a quick and easy fix back when I had the same hanging issue, but there should be a better explanation if --enable-mpi-thread-multiple is indeed enough.
>
> It is enough - we set all required things internally

Is that for sure? My original message originates from a hang in the PETSc tests and I get quite different results depending on whether I compile OpenMPI with --enable-mpi-thread-multiple only or not.

I recompiled PETSc with debugging enabled against OpenMPI built with the "correct" flags mentioned by Pierre, and this the stack trace I get:

$ mpirun -n 2 xterm -e gdb ./ex5

        ^C
        Program received signal SIGINT, Interrupt.
        0x00007fff991160fa in __psynch_cvwait ()
           from /usr/lib/system/libsystem_kernel.dylib
        (gdb) where
        #0 0x00007fff991160fa in __psynch_cvwait ()
           from /usr/lib/system/libsystem_kernel.dylib
        #1 0x00007fff98d6ffb9 in ?? () from /usr/lib/system/libsystem_c.dylib

        ^C
        Program received signal SIGINT, Interrupt.
        0x00007fff991160fa in __psynch_cvwait ()
           from /usr/lib/system/libsystem_kernel.dylib
        (gdb) where
        #0 0x00007fff991160fa in __psynch_cvwait ()
           from /usr/lib/system/libsystem_kernel.dylib
        #1 0x00007fff98d6ffb9 in ?? () from /usr/lib/system/libsystem_c.dylib

If I recompile PETSc against OpenMPI built with --enable-mpi-thread-multiple only (leaving out the other flags, which Pierre suggested is wrong), I get the following traces:

        ^C
        Program received signal SIGINT, Interrupt.
        0x00007fff991160fa in __psynch_cvwait ()
           from /usr/lib/system/libsystem_kernel.dylib
        (gdb) where
        #0 0x00007fff991160fa in __psynch_cvwait ()
           from /usr/lib/system/libsystem_kernel.dylib
        #1 0x00007fff98d6ffb9 in ?? () from /usr/lib/system/libsystem_c.dylib

        ^C
        Program received signal SIGINT, Interrupt.
        0x0000000101edca28 in mca_common_sm_init ()
           from /usr/local/Cellar/open-mpi/1.7.3/lib/libmca_common_sm.4.dylib
        (gdb) where
        #0 0x0000000101edca28 in mca_common_sm_init ()
           from /usr/local/Cellar/open-mpi/1.7.3/lib/libmca_common_sm.4.dylib
        #1 0x0000000101ed8a38 in mca_mpool_sm_init ()
           from /usr/local/Cellar/open-mpi/1.7.3/lib/openmpi/mca_mpool_sm.so
        #2 0x0000000101c383fa in mca_mpool_base_module_create ()
           from /usr/local/lib/libmpi.1.dylib
        #3 0x0000000102933b41 in mca_btl_sm_add_procs ()
           from /usr/local/Cellar/open-mpi/1.7.3/lib/openmpi/mca_btl_sm.so
        #4 0x0000000102929dfb in mca_bml_r2_add_procs ()
           from /usr/local/Cellar/open-mpi/1.7.3/lib/openmpi/mca_bml_r2.so
        #5 0x000000010290a59c in mca_pml_ob1_add_procs ()
           from /usr/local/Cellar/open-mpi/1.7.3/lib/openmpi/mca_pml_ob1.so
        #6 0x0000000101bd859b in ompi_mpi_init () from /usr/local/lib/libmpi.1.dylib
        #7 0x0000000101bf24da in MPI_Init_thread () from /usr/local/lib/libmpi.1.dylib
        #8 0x00000001000724db in PetscInitialize (argc=0x7fff5fbfed48,
            args=0x7fff5fbfed40, file=0x0,
            help=0x1000061c0 "Bratu nonlinear PDE in 2d.\nWe solve the Bratu (SFI - soid fuel ignition) problem in a 2D rectangular\ndomain, using distributed arrays(DMDAs) to partition the parallel grid.\nThe command line options"...)
            at /tmp/petsc-3.4.3/src/sys/objects/pinit.c:675
        #9 0x0000000100000d8c in main ()

Line 675 of pinit.c is

        ierr = MPI_Init_thread(argc,args,MPI_THREAD_FUNNELED,&provided);CHKERRQ(ierr);

Dominique