Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Parent terminates when child crashes/terminates (without finalizing)
From: N.M. Maclaren (nmm1_at_[hidden])
Date: 2010-12-17 07:38:04

On Dec 17 2010, Suraj Prabhakaran wrote:
>I am observing a behavior where when the parent spawns a child and when
>the child terminates abruptly (for example with exit() before
>MPI_Finalize() ), the parent also terminates even after both the child
>and parent have explicitly called a MPI_disconnect. This turns out to be
>a disadvantage. ...

Indeed. But that is what will sometimes happen, and it's not primarily
an OpenMPI issue - though clearly OpenMPI should try to avoid it when
possible. It is what happens under some circumstances under some systems.
You really don't want to know why, I assure you :-( The root cause is
a combination of shoddy interface design and too many programs being too
clever by half.

The following is key information to provide:

    The name and precise variants of the operating system, compilers
and any libraries used for both parent AND child.

     Whether the MPI was being run under a batch scheduler or similar
controlling application and, if so, the precise variant of that.

    The way in which the child failed (e.g. the signal number AND how
that signal was generated). If you are sure that it happens with a
plain exit(), you have answered this one already.

   And, heaven help us all, sometimes the operating system, compiler,
library and controller configuration, precise environment that the
MPI program was running under. Sometimes even other actions of the child
can matter.

Finding the last needs considerable expertise, even for an experienced
administrator, so start with the first three. All of them are critical
to this issue, unfortunately.

Nick Maclaren.