Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [patch] async-signal-safe signal handler
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-12-18 08:52:17


This patch looks good to me (sorry for the delay in replying -- MPI Forum + OMPI dev meeting got in the way).

Brian -- do you have any opinions on it?

On Dec 11, 2013, at 1:43 AM, Kawashima, Takahiro <t-kawashima_at_[hidden]> wrote:

> Hi,
>
> Open MPI's signal handler (show_stackframe function defined in
> opal/util/stacktrace.c) calls non-async-signal-safe functions
> and it causes a problem.
>
> See attached mpisigabrt.c. Passing corrupted memory to realloc(3)
> will cause SIGABRT and show_stackframe function will be invoked.
> But invoked show_stackframe function deadlocks in backtrace_symbols(3)
> on some systems because backtrace_symbols(3) calls malloc(3)
> internally and a deadlock of realloc/malloc mutex occurs.
>
> Attached mpisigabrt.gstack.txt shows the stacktrace gotten
> by gdb in this deadlock situation on Ubuntu 12.04 LTS (precise)
> x86_64. Though I could not reproduce this behavior on RHEL 5/6,
> I can reproduce it also on K computer and its successor PRIMEHPC FX10.
> Passing non-heap memory to free(3) and double-free also cause
> this deadlock.
>
> malloc (and backtrace_symbols) is not marked as async-signal-safe
> in POSIX and current glibc, though it seems to have been marked
> in old glibc. So we should not call it in the signal handler now.
>
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04
> http://cygwin.com/ml/libc-help/2013-06/msg00005.html
>
> I wrote a patch to address this issue. See the attached
> async-signal-safe-stacktrace.patch.
>
> This patch calls backtrace_symbols_fd(3) instead of backtrace_symbols(3).
> Though backtrace_symbols_fd is not declared as async-signal-safe,
> it is described not to call malloc internally in its man. So it
> should be rather safer.
>
> Output format of show_stackframe function is not changed by
> this patch. But the opal_backtrace_print function (backtrace
> framework) interface is changed for the output format compatibility.
> This requires changes in some additional files (ompi_mpi_abort.c
> etc.).
>
> This patch also removes unnecessary fflush(3) calls, which are
> meaningless for write(2) system call but might cause a similar
> problem.
>
> What do you think about this patch?
>
> Takahiro Kawashima,
> MPI development team,
> Fujitsu
> <async-signal-safe-stacktrace.patch><mpisigabrt.c><mpisigabrt.gstack.txt>_______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/