Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Laurent.POREZ_at_[hidden]
Date: 2006-10-26 10:19:10


I developped a launcher application :
a MPI application (say main_exe) lauches 2 MPI applications (say exe1 and exe2), using MPI_Comm_spawn_multiple.

Now, I'm looking at the behavior when an exe crashes.

What I can see is the following :
1) when everybody is launched, I see the followings processes, using 'ps' :
- the 'mpiexec -v -d -n 1 ./main_exe' command
- the orted server used for 'main_exe' (say 'orted1')
- main_exe
- the orted server used for 'exe1' and 'exe2' (say 'orted2')
- exe1
- exe2

2) I use kill -9 to 'crash' exe2

3) orted2 and exe1 finish.

4) with ps, I see it remains the following processes : mpiexec, 'orted1', main_exe

5) main_exe tries to send a message to exe1, using MPI_Bsend :
main_exe gets killed by a SIG_PIPE signal !!!!

So what I see is that when a part of an MPI application crashes, the whole application crashes !
Is there a way to get an other behavior ? For exemple, MPI_Bsend could return an error message.

A few additionnal informations :
- I work on linux, with Open-MPI 1.1.1.
- I'm developping in C and C++.