Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Parent terminates when child crashes/terminates (without finalizing)
From: George Bosilca (bosilca_at_[hidden])
Date: 2010-12-17 12:24:27


Let me try to round the edges on this one. It is not that we couldn't or wouldn't like to have a more "MPI" compliant approach on this, but the definition of connected processes in the MPI standard is [kind of] shady. One thing is clear however, it is a transitive relationship. If A is "connected" to B, and B is "connected" to C, then A and C are "connected" even if they don't know anything about each other. In other terms when you call disconnect, it is difficult to compute the peers that have to be "disconnected" as even if you disconnected them in one communicator they can still be connected some other way. Therefore, we choose the simplest path, once connected the processes remain connected until the end of the execution.

However, as Ralph pointed out, if you call MPI_Finalize as requested by the MPI standard, we handle the case nicely without forcing every process to abort.

If you're looking for a winter break project, we do accept contributions from the community ...

  george.

On Dec 17, 2010, at 09:43 , Ralph Castain wrote:

> That is the expected behavior designed into Open MPI. If any process calls MPI_Init and then terminates without calling MPI_Finalize, we flag that as an abnormal termination and abort the entire job.
>
> We don't provide any option for avoiding that behavior.
>
> On Dec 17, 2010, at 5:13 AM, Suraj Prabhakaran wrote:
>
>> Hello,
>>
>> I am observing a behavior where when the parent spawns a child and when the child terminates abruptly (for example with exit() before MPI_Finalize() ), the parent also terminates even after both the child and parent have explicitly called a MPI_disconnect. This turns out to be a disadvantage. A sample program is as follows:
>>
>> Parent:
>>
>> int main (int argc, char *argv[])
>> {
>> MPI_Init(&argc, &argv);
>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>> MPI_Comm child_comm;
>> MPI_Comm_spawn("./child", MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);
>> printf("spawned a child\n");
>> MPI_Comm_disconnect(&child_comm);
>> printf("Disconnected from the child\n");
>> sleep(5000);
>> MPI_Finalize();
>> return 0;
>> }
>>
>> Child:
>>
>> int main (int argc, char *argv[])
>> {
>> MPI_Init(&argc, &argv);
>> MPI_Comm parent, parent1;
>> MPI_Comm_get_parent(&parent);
>> MPI_Comm_disconnect(&parent);
>> if(parent == MPI_COMM_NULL)
>> printf("Child: Disconnected from the parent, Exiting\n\n");
>>
>> MPI_Comm_get_parent(&parent1);
>>
>> if(parent1 != MPI_COMM_NULL)
>> printf("Child: yes, i got my parent again\n");
>>
>> exit(1); //abrupt end
>>
>> MPI_Finalize();
>> return 0;
>> }
>>
>> In the above simple example, the second printf will not be displayed clearly indicating that the child is really disconnected from the parent. However, at exit() of the child, the parent terminates too. Perhaps there is a way to avoid this kind of auto cleanup?
>>
>> Thanks,
>> Suraj Prabhakaran
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel