Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Laurent.POREZ_at_[hidden]
Date: 2006-10-26 10:19:10


I developped a launcher application :
a MPI application (say main_exe) lauches 2 MPI applications (say exe1 and exe2), using MPI_Comm_spawn_multiple.

Now, I'm looking at the behavior when an exe crashes.

What I can see is the following :
1) when everybody is launched, I see the followings processes, using 'ps' :
- the 'mpiexec -v -d -n 1 ./main_exe' command
- the orted server used for 'main_exe' (say 'orted1')
- main_exe
- the orted server used for 'exe1' and 'exe2' (say 'orted2')
- exe1
- exe2

2) I use kill -9 to 'crash' exe2

3) orted2 and exe1 finish.

4) with ps, I see it remains the following processes : mpiexec, 'orted1', main_exe

5) main_exe tries to send a message to exe1, using MPI_Bsend :
main_exe gets killed by a SIG_PIPE signal !!!!

So what I see is that when a part of an MPI application crashes, the whole application crashes !
Is there a way to get an other behavior ? For exemple, MPI_Bsend could return an error message.

A few additionnal informations :
- I work on linux, with Open-MPI 1.1.1.
- I'm developping in C and C++.