Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2007-08-17 12:25:17


The MPI standard state that the correct way to abort/kill an MPI
application is using the MPI_Abort function. Except, if you're doing
some kind of fault tolerance stuff, there is no reason to end one of
your MPI processes via exit.

   Thanks,
     george.

On Aug 16, 2007, at 12:04 PM, Daniel Spångberg wrote:

> Dear Open-MPI user list members,
>
> I am currently having a user with an application where one of the
> MPI-processes die, but the openmpi-system does not kill the rest of
> the
> application.
>
> Since the mpirun man page states the following I would expect it to
> take
> care of killing the application if a process exits without calling
> MPI_Finalize:
>
> Process Termination / Signal Handling
> During the run of an MPI application, if any rank dies
> abnormally
> (either exiting before invoking MPI_FINALIZE, or dying as the
> result of a signal), mpirun will print out an error message
> and
> kill the rest of the MPI application.
>
> The following test program demonstrates the behaviour (program
> hangs until
> it is killed by the user or batch system):
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <mpi.h>
>
> #define RANK_DEATH 1
>
> int main(int argc, char **argv)
> {
> int rank;
> MPI_Init(&argc,&argv);
> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>
> sleep(10);
> if (rank==RANK_DEATH)
> exit(1);
> sleep(10);
> MPI_Finalize();
> return 0;
> }
>
> I have tested this on openmpi 1.2.1 as well as the latest stable
> 1.2.3. I
> am on Linux x86_64.
>
> Is this a bug, or are there some flags I can use to force the
> mpirun (or
> orted, or...) to kill the whole MPI program when this happens?
>
> If one of the application processes die from a signal (I have
> tested SEGV
> and FPE) rather than just exiting the whole application is indeed
> killed.
>
> Best regards
> Daniel Spångberg
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users