Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Intermittent mpirun crash?
From: Rolf vandeVaart (rvandevaart_at_[hidden])
Date: 2014-01-30 12:29:36


I just retested with --mca mpi_leave_pinned 0 and that made no difference. I still see the mpirun crash.

>-----Original Message-----
>From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of George
>Bosilca
>Sent: Thursday, January 30, 2014 11:59 AM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] Intermittent mpirun crash?
>
>I got something similar 2 days ago, with a large software package abusing of
>MPI_Waitany/MPI_Waitsome (that was working seamlessly a month ago). I
>had to find a quick fix. Upon figuring out that turning the leave_pinned off
>fixes the problem, I did not investigate any further.
>
>Do you see a similar behavior?
>
> George.
>
>On Jan 30, 2014, at 17:26 , Rolf vandeVaart <rvandevaart_at_[hidden]> wrote:
>
>> I am seeing this happening to me very intermittently. Looks like mpirun is
>getting a SEGV. Is anyone else seeing this?
>> This is 1.7.4 built yesterday. (Note that I added some stuff to what
>> is being printed out so the message is slightly different than 1.7.4
>> output)
>>
>> mpirun - -np 6 -host
>> drossetti-ivy0,drossetti-ivy1,drossetti-ivy2,drossetti-ivy3 --mca
>> btl_openib_warn_default_gid_prefix 0 -- `pwd`/src/MPI_Waitsome_p_c
>> MPITEST info (0): Starting: MPI_Waitsome_p: Persistent Waitsome
>> using two nodes
>> MPITEST_results: MPI_Waitsome_p: Persistent Waitsome using two nodes
>> all tests PASSED (742) [drossetti-ivy0:10353] *** Process
>> (mpirun)received signal *** [drossetti-ivy0:10353] Signal:
>> Segmentation fault (11) [drossetti-ivy0:10353] Signal code: Address
>> not mapped (1) [drossetti-ivy0:10353] Failing at address:
>> 0x7fd31e5f208d [drossetti-ivy0:10353] End of signal information - not
>> sleeping
>> gmake[1]: *** [MPI_Waitsome_p_c] Segmentation fault (core dumped)
>> gmake[1]: Leaving directory `/geppetto/home/rvandevaart/public/ompi-
>tests/trunk/intel_tests'
>>
>> (gdb) where
>> #0 0x00007fd31f620807 in ?? () from /lib64/libgcc_s.so.1
>> #1 0x00007fd31f6210b9 in _Unwind_Backtrace () from
>> /lib64/libgcc_s.so.1
>> #2 0x00007fd31fb2893e in backtrace () from /lib64/libc.so.6
>> #3 0x00007fd320b0d622 in opal_backtrace_buffer
>(message_out=0x7fd31e5e33a0, len_out=0x7fd31e5e33ac)
>> at
>> ../../../../../opal/mca/backtrace/execinfo/backtrace_execinfo.c:57
>> #4 0x00007fd320b0a794 in show_stackframe (signo=11,
>> info=0x7fd31e5e3930, p=0x7fd31e5e3800) at
>> ../../../opal/util/stacktrace.c:354
>> #5 <signal handler called>
>> #6 0x00007fd31e5f208d in ?? ()
>> #7 0x00007fd31e5e46d8 in ?? ()
>> #8 0x000000000000c2a8 in ?? ()
>> #9 0x0000000000000000 in ?? ()
>>
>>
>> ----------------------------------------------------------------------
>> ------------- This email message is for the sole use of the intended
>> recipient(s) and may contain confidential information. Any
>> unauthorized review, use, disclosure or distribution is prohibited.
>> If you are not the intended recipient, please contact the sender by
>> reply email and destroy all copies of the original message.
>> ----------------------------------------------------------------------
>> ------------- _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/devel