Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] SIGTERM propagation across MPI processes
From: Júlio Hoffimann (julio.hoffimann_at_[hidden])
Date: 2012-03-22 18:49:53


Dear all,

I'm trying to handle signals inside a MPI task farming model. Following is
a pseudo-code of what i'm trying to achieve:

volatile sig_atomic_t unexpected_error_occurred = 0;
void my_handler( int sig ){
    unexpected_error_occurred = 1;}
//// somewhere in the code...//
signal(SIGTERM, my_handler);
if (root process) {

    // do stuff

    if ( unexpected_error_occurred ) {

        // save something

        // reraise the SIGTERM again, but now with the default handler
        signal(SIGTERM, SIG_DFL);
        raise(SIGTERM);
    }}else { // slave process

    // do different stuff

    if ( unexpected_error_occurred ) {

        // just propragate the signal to the root
        signal(SIGTERM, SIG_DFL);
        raise(SIGTERM);
    }}
signal(SIGTERM, SIG_DFL); // reassign default handler
// continues the code...

As can be seen, the signal handling is required for implementing a restart
feature. All the problem resides in the assumption i made that all
processes in the communicator will receive a SIGTERM as a side effect. Is
it a valid assumption? How the actual MPI implementation deals with such
scenarios?

I also tried to replace all the raise() calls by MPI_Abort(), which
according to the documentation (
http://www.open-mpi.org/doc/v1.5/man3/MPI_Abort.3.php), sends a SIGTERM to
all associated processes. The undesired behaviour persists: when killing a
slave process, the save section in the root branch is not executed.

Appreciate any help,
Júlio.