Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Intermittent mpirun crash?
From: Rolf vandeVaart (rvandevaart_at_[hidden])
Date: 2014-01-30 11:56:55


I did build with --enable-debug. And I experimented with sending signal 11 to mpirun to see what I get out. In that case, I get a nice backtrace.
Weird.

[rvandevaart_at_drossetti-ivy0 intel_tests]$ mpirun -np 2 sleep 20
[drossetti-ivy0:14033] *** Process (mpirun)received signal ***
[drossetti-ivy0:14033] Signal: Segmentation fault (11)
[drossetti-ivy0:14033] Signal code: (0)
[drossetti-ivy0:14033] Failing at address: 0x7e5500005ace
[drossetti-ivy0:14033] End of signal information - not sleeping
[drossetti-ivy0:14033] *** Return value from opal_backtrace_buffer is 0 ***
[drossetti-ivy0:14033] [ 0] /lib64/libpthread.so.0(+0xf500) [0x7f27b2fd8500]
[drossetti-ivy0:14033] [ 1] /lib64/libc.so.6(__poll+0x53) [0x7f27b2d15293]
[drossetti-ivy0:14033] [ 2] /geppetto/home/rvandevaart/ompi/ompi-v1.7/64-nocuda/lib/libopen-pal.so.6(+0x963e5) [0x7f27b3d283e5]
[drossetti-ivy0:14033] [ 3] /geppetto/home/rvandevaart/ompi/ompi-v1.7/64-nocuda/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x26e) [0x7f27b3d1cdfc]
[drossetti-ivy0:14033] [ 4] mpirun(orterun+0x137d) [0x4052b6]
[drossetti-ivy0:14033] [ 5] mpirun(main+0x20) [0x4037b4]
[drossetti-ivy0:14033] [ 6] /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f27b2c54cdd]
[drossetti-ivy0:14033] [ 7] mpirun() [0x4036d9]
[drossetti-ivy0:14033] *** End of error message ***
[rvandevaart_at_drossetti-ivy0 intel_tests]$
_____________________
From: devel [devel-bounces_at_[hidden]] On Behalf Of Ralph Castain [rhc_at_[hidden]]
Sent: Thursday, January 30, 2014 11:51 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] Intermittent mpirun crash?

Huh - not much info there, I'm afraid. I gather you didn't build this with --enable-debug?

On Jan 30, 2014, at 8:26 AM, Rolf vandeVaart <rvandevaart_at_[hidden]> wrote:

> I am seeing this happening to me very intermittently. Looks like mpirun is getting a SEGV. Is anyone else seeing this?
> This is 1.7.4 built yesterday. (Note that I added some stuff to what is being printed out so the message is slightly different than 1.7.4 output)
>
> mpirun - -np 6 -host drossetti-ivy0,drossetti-ivy1,drossetti-ivy2,drossetti-ivy3 --mca btl_openib_warn_default_gid_prefix 0 -- `pwd`/src/MPI_Waitsome_p_c
> MPITEST info (0): Starting: MPI_Waitsome_p: Persistent Waitsome using two nodes
> MPITEST_results: MPI_Waitsome_p: Persistent Waitsome using two nodes all tests PASSED (742)
> [drossetti-ivy0:10353] *** Process (mpirun)received signal ***
> [drossetti-ivy0:10353] Signal: Segmentation fault (11)
> [drossetti-ivy0:10353] Signal code: Address not mapped (1)
> [drossetti-ivy0:10353] Failing at address: 0x7fd31e5f208d
> [drossetti-ivy0:10353] End of signal information - not sleeping
> gmake[1]: *** [MPI_Waitsome_p_c] Segmentation fault (core dumped)
> gmake[1]: Leaving directory `/geppetto/home/rvandevaart/public/ompi-tests/trunk/intel_tests'
>
> (gdb) where
> #0 0x00007fd31f620807 in ?? () from /lib64/libgcc_s.so.1
> #1 0x00007fd31f6210b9 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
> #2 0x00007fd31fb2893e in backtrace () from /lib64/libc.so.6
> #3 0x00007fd320b0d622 in opal_backtrace_buffer (message_out=0x7fd31e5e33a0, len_out=0x7fd31e5e33ac)
> at ../../../../../opal/mca/backtrace/execinfo/backtrace_execinfo.c:57
> #4 0x00007fd320b0a794 in show_stackframe (signo=11, info=0x7fd31e5e3930, p=0x7fd31e5e3800) at ../../../opal/util/stacktrace.c:354
> #5 <signal handler called>
> #6 0x00007fd31e5f208d in ?? ()
> #7 0x00007fd31e5e46d8 in ?? ()
> #8 0x000000000000c2a8 in ?? ()
> #9 0x0000000000000000 in ?? ()
>
>
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may contain
> confidential information. Any unauthorized review, use, disclosure or distribution
> is prohibited. If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel