Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] SIGTERM propagation across MPI processes
From: Júlio Hoffimann (julio.hoffimann_at_[hidden])
Date: 2012-03-25 09:19:34


Dear Ralph,

Thank you for your prompt reply. I confirmed what you just said by reading
the mpirun man page at the sections *Signal Propagation* and *Process
Termination / Signal Handling*.

"During the run of an MPI application, if any rank dies
 abnormally (either exiting before invoking MPI_FINALIZE, or dying as the
result of a signal), mpirun will print out an error message and kill the
rest of the MPI application."

If i understood correctly, the SIGKILL signal is sent to every process on a
premature death. In my point of view, i consider this a bug. If OpenMPI
allows handling signals such as SIGTERM, the other processes in the
communicator should also have the opportunity to die prettily. Perhaps i'm
missing something?

Supposing the described behaviour in the last paragraph, i think would be
great to explicitly mention the SIGKILL in the man page, or even better,
fix the implementation to send SIGTERM instead, making possible for the
user cleanup all processes before exit.

I solved my particular problem by adding another flag *
unexpected_error_on_slave*:

volatile sig_atomic_t unexpected_error_occurred = 0;int
unexpected_error_on_slave = 0;enum tag { work_tag, die_tag }
void my_handler( int sig ){
    unexpected_error_occurred = 1;}
//// somewhere in the code...//
signal(SIGTERM, my_handler);
if (root process) {

    // do stuff

    world.recv(mpi::any_source, die_tag, unexpected_error_on_slave);
    if ( unexpected_error_occurred || unexpected_error_on_slave ) {

        // save something

        world.abort(SIGABRT);
    }}else { // slave process

    // do different stuff

    if ( unexpected_error_occurred ) {

        // just communicate the problem to the root
        world.send(root,die_tag,1);
        signal(SIGTERM,SIG_DFL);
        while(true)
            ; // wait, master will take care of this
    }
    world.send(root,die_tag,0); // everything is fine}
signal(SIGTERM, SIG_DFL); // reassign default handler
// continues the code...

Note the slave must hang for the store operation get executed at the root,
otherwise we back for the previous scenario. It's theoretically unnecessary
send MPI messages to accomplish the desired cleanup, and in more complex
applications this can turn into a nightmare. As we know, asynchronous
events are insane to debug.

Best regards,
Júlio.

P.S.: MPI 1.4.3 from Ubuntu 11.10 repositories.

2012/3/23 Ralph Castain <rhc_at_[hidden]>

> Well, yes and no. When a process abnormally terminates, OMPI will kill the
> job - this is done by first hitting each process with a SIGTERM, followed
> shortly thereafter by a SIGKILL. So you do have a short time on each
> process to attempt to cleanup.
>
> My guess is that your signal handler actually is getting called, but we
> then kill the process before you can detect that it was called.
>
> You might try adjusting the time between sigterm and sigkill using
> the odls_base_sigkill_timeout MCA param:
>
> mpirun -mca odls_base_sigkill_timeout N
>
> should cause it to wait for N seconds before issuing the sigkill. Not sure
> if that will help or not - it used to work for me, but I haven't tried it
> for awhile. What versions of OMPI are you using?
>
>
> On Mar 22, 2012, at 4:49 PM, Júlio Hoffimann wrote:
>
> Dear all,
>
> I'm trying to handle signals inside a MPI task farming model. Following is
> a pseudo-code of what i'm trying to achieve:
>
> volatile sig_atomic_t unexpected_error_occurred = 0;
> void my_handler( int sig ){
> unexpected_error_occurred = 1;}
> //// somewhere in the code...//
> signal(SIGTERM, my_handler);
> if (root process) {
>
> // do stuff
>
> if ( unexpected_error_occurred ) {
>
> // save something
>
> // reraise the SIGTERM again, but now with the default handler
> signal(SIGTERM, SIG_DFL);
> raise(SIGTERM);
> }}else { // slave process
>
> // do different stuff
>
> if ( unexpected_error_occurred ) {
>
> // just propragate the signal to the root
> signal(SIGTERM, SIG_DFL);
> raise(SIGTERM);
> }}
> signal(SIGTERM, SIG_DFL); // reassign default handler
> // continues the code...
>
>
> As can be seen, the signal handling is required for implementing a restart
> feature. All the problem resides in the assumption i made that all
> processes in the communicator will receive a SIGTERM as a side effect. Is
> it a valid assumption? How the actual MPI implementation deals with such
> scenarios?
>
> I also tried to replace all the raise() calls by MPI_Abort(), which
> according to the documentation (
> http://www.open-mpi.org/doc/v1.5/man3/MPI_Abort.3.php), sends a SIGTERM
> to all associated processes. The undesired behaviour persists: when killing
> a slave process, the save section in the root branch is not executed.
>
> Appreciate any help,
> Júlio.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>