Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Parent terminates when child crashes/terminates (without finalizing)
From: Suraj Prabhakaran (suraj.prabhakaran_at_[hidden])
Date: 2010-12-17 14:50:21

On 12/17/2010 06:24 PM, George Bosilca wrote:
> Let me try to round the edges on this one. It is not that we couldn't or wouldn't like to have a more "MPI" compliant approach on this, but the definition of connected processes in the MPI standard is [kind of] shady. One thing is clear however, it is a transitive relationship. If A is "connected" to B, and B is "connected" to C, then A and C are "connected" even if they don't know anything about each other. In other terms when you call disconnect, it is difficult to compute the peers that have to be "disconnected" as even if you disconnected them in one communicator they can still be connected some other way.*Therefore, we choose the simplest path, once connected the processes remain connected until the end of the execution.
> *
> However, as Ralph pointed out, if you call MPI_Finalize as requested by the MPI standard, we handle the case nicely without forcing every process to abort.
> If you're looking for a winter break project, we do accept contributions from the community ...
> george.

Yes, with MPI_Finalize() called before an abrupt exit() it is clean but
talking generally about releasing connections, if Process A and Process
B are connected through MPI_Comm_connect/accept and then made to
MPI_Comm_disconnect at a later point of time, an abrupt exit of Process
B (for example) *after* the disconnect does *NOT* affect Process A! I
also tried a triangular connect/disconnect and it is quite clean!
So the problem that I indicated occurs only between spawned child and
parent (after they disconnect) and *does not* occur between two
processes connected via port and then later disconnects. Perhaps then
the problem is easier to corner?

P.s: I also indicated in another mail that processes trying to connect
through a port, *sometimes* blocks at the connect/accept call or
sometimes one of the processes blocks indefinitely at the disconnect
call. I underline *sometimes*. Any inputs for this one?