Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI process hangs if OpenMPI is compiled with --enable-thread-multiple
From: Dominique Orban (dominique.orban_at_[hidden])
Date: 2013-11-24 15:15:13


Pierre,

Thank you for pointing out the erroneous flags. I am indeed compiling from Homebrew. After using the flags mentioned in the link you give, this is the output of Ralph's test program:

$ mpirun -n 2 ./testmpi2
Calling MPI_Init_thread...
Calling MPI_Init_thread...
MPI_Init_thread returned, provided = 3
MPI_Init_thread returned, provided = 3
[warn] select: Bad file descriptor
[warn] select: Bad file descriptor

It doesn't hang anymore but I'm not sure what to make of the warnings. Some runs don't trigger the warnings. Please pardon my MPI ignorance.

My question originates from a hang similar to the one I described in my first message in the PETSc tests. They still hang after I corrected the OpenMPI compile flags. I'm in touch with the PETSc folks as well about this.

Dominique

On 2013-11-23, at 9:22 PM, Pierre Jolivet <jolivet_at_[hidden]> wrote:

> Dominique,
> It looks like you are compiling Open MPI with Homebrew. The flags they use in the formula when --enable-mpi-thread-multiple is wrong.
> c.f. a similar problem with MacPorts https://lists.macosforge.org/pipermail/macports-tickets/2013-June/138145.html.
>
> Pierre
>
> On Nov 23, 2013, at 4:56 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> Hmmm...well, it seems to work for me:
>>
>> $ mpirun -n 4 ./thread_init
>> Calling MPI_Init_thread...
>> Calling MPI_Init_thread...
>> Calling MPI_Init_thread...
>> Calling MPI_Init_thread...
>> MPI_Init_thread returned, provided = 3
>> MPI_Init_thread returned, provided = 3
>> MPI_Init_thread returned, provided = 3
>> MPI_Init_thread returned, provided = 3
>> $
>>
>> This is with the current 1.7 code branch, so it's possible something has been updated. You might try it with the next nightly tarball and see if it helps.
>>
>> BTW: The correct configure option is --enable-mpi-thread-multiple
>>
>> My test program:
>>
>> #include <mpi.h>
>> #include <stdio.h>
>> int main(int argc, const char* argv[]) {
>> int provided = -1;
>> printf("Calling MPI_Init_thread...\n");
>> MPI_Init_thread(NULL, NULL, MPI_THREAD_MULTIPLE, &provided);
>> printf("MPI_Init_thread returned, provided = %d\n", provided);
>> MPI_Finalize();
>> return 0;
>> }
>>
>>
>> On Nov 21, 2013, at 1:36 PM, Dominique Orban <dominique.orban_at_[hidden]> wrote:
>>
>>> Hi,
>>>
>>> I'm compiling the example code at the bottom of the following page that illustrates MPI_Init_Thread():
>>>
>>> http://mpi.deino.net/mpi_functions/mpi_init_thread.html
>>>
>>> I have OpenMPI 1.7.3 installed on OSX 10.8.5 with --enable-thread-multiple compiled with clang-425.0.28. I can reproduce the following on OSX 10.9 (clang-500) and another user was able to reproduce it on some flavor of Linux:
>>>
>>> $ mpicc -g -o testmpi testmpi.c -lmpi
>>> $ mpirun -n 2 ./testmpi
>>> $ # hangs forever
>>>
>>> I've no knowledge of how to debug MPI programs but it was suggested to me to do this:
>>>
>>> $ mpirun -n 2 xterm -e gdb ./testmpi
>>>
>>> In the first xterm, I say 'run' in gdb, interrupt the program after a while and get a backtrace:
>>>
>>> ^C
>>> Program received signal SIGINT, Interrupt.
>>> 0x00007fff99116322 in select$DARWIN_EXTSN ()
>>> from /usr/lib/system/libsystem_kernel.dylib
>>> (gdb) where
>>> #0 0x00007fff99116322 in select$DARWIN_EXTSN ()
>>> from /usr/lib/system/libsystem_kernel.dylib
>>> #1 0x00000001001963c2 in select_dispatch ()
>>> from /usr/local/Cellar/open-mpi/1.7.3/lib/libopen-pal.6.dylib
>>> #2 0x000000010018f178 in opal_libevent2021_event_base_loop ()
>>> from /usr/local/Cellar/open-mpi/1.7.3/lib/libopen-pal.6.dylib
>>> #3 0x000000010015f059 in opal_progress ()
>>> from /usr/local/Cellar/open-mpi/1.7.3/lib/libopen-pal.6.dylib
>>> #4 0x0000000100019321 in ompi_mpi_init () from /usr/local/lib/libmpi.1.dylib
>>> #5 0x00000001000334da in MPI_Init_thread () from /usr/local/lib/libmpi.1.dylib
>>> #6 0x0000000100000ddb in main (argc=1, argv=0x7fff5fbfedc0) at testmpi.c:9
>>> (gdb) backtrace
>>> #0 0x00007fff99116322 in select$DARWIN_EXTSN ()
>>> from /usr/lib/system/libsystem_kernel.dylib
>>> #1 0x00000001001963c2 in select_dispatch ()
>>> from /usr/local/Cellar/open-mpi/1.7.3/lib/libopen-pal.6.dylib
>>> #2 0x000000010018f178 in opal_libevent2021_event_base_loop ()
>>> from /usr/local/Cellar/open-mpi/1.7.3/lib/libopen-pal.6.dylib
>>> #3 0x000000010015f059 in opal_progress ()
>>> from /usr/local/Cellar/open-mpi/1.7.3/lib/libopen-pal.6.dylib
>>> #4 0x0000000100019321 in ompi_mpi_init () from /usr/local/lib/libmpi.1.dylib
>>> #5 0x00000001000334da in MPI_Init_thread () from /usr/local/lib/libmpi.1.dylib
>>> #6 0x0000000100000ddb in main (argc=1, argv=0x7fff5fbfedc0) at testmpi.c:9
>>> (gdb)
>>>
>>> In the second xterm window:
>>>
>>> ^C
>>> Program received signal SIGINT, Interrupt.
>>> 0x00000001002e9a28 in mca_common_sm_init ()
>>> from /usr/local/Cellar/open-mpi/1.7.3/lib/libmca_common_sm.4.dylib
>>> (gdb) where
>>> #0 0x00000001002e9a28 in mca_common_sm_init ()
>>> from /usr/local/Cellar/open-mpi/1.7.3/lib/libmca_common_sm.4.dylib
>>> #1 0x00000001002e5a38 in mca_mpool_sm_init ()
>>> from /usr/local/Cellar/open-mpi/1.7.3/lib/openmpi/mca_mpool_sm.so
>>> #2 0x00000001000793fa in mca_mpool_base_module_create ()
>>> from /usr/local/lib/libmpi.1.dylib
>>> #3 0x000000010053fb41 in mca_btl_sm_add_procs ()
>>> from /usr/local/Cellar/open-mpi/1.7.3/lib/openmpi/mca_btl_sm.so
>>> #4 0x0000000100535dfb in mca_bml_r2_add_procs ()
>>> from /usr/local/Cellar/open-mpi/1.7.3/lib/openmpi/mca_bml_r2.so
>>> #5 0x000000010051e59c in mca_pml_ob1_add_procs ()
>>> from /usr/local/Cellar/open-mpi/1.7.3/lib/openmpi/mca_pml_ob1.so
>>> #6 0x000000010001959b in ompi_mpi_init () from /usr/local/lib/libmpi.1.dylib
>>> #7 0x00000001000334da in MPI_Init_thread () from /usr/local/lib/libmpi.1.dylib
>>> #8 0x0000000100000ddb in main (argc=1, argv=0x7fff5fbfedc0) at testmpi.c:9
>>> (gdb)
>>>
>>>
>>> The output of `ompi_info --parsable` is here: https://gist.github.com/7590040
>>>
>>> Thanks in advance.
>>>
>>> Dominique
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users