Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] SIGTERM propagation across MPI processes
From: Júlio Hoffimann (julio.hoffimann_at_[hidden])
Date: 2012-03-25 13:28:20


I wrote the version in a previous P.S. statement: MPI 1.4.3 from Ubuntu
11.10 repositories. :-)

Thanks for the clarifications!

2012/3/25 Ralph Castain <rhc_at_[hidden]>

>
> On Mar 25, 2012, at 10:57 AM, Júlio Hoffimann wrote:
>
> I forgot to mention, i tried to set the odls_base_sigkill_timeout as you
> told, even 5s was not sufficient for the root execute it's task, and most
> important, the kill was instantaneous, there is no 5s hang. My erroneous
> conclusion: SIGKILL was being sent instead of SIGTERM.
>
>
> Which version are you using? Could be a bug in there - I can take a look.
>
>
> About the man page, at least for me, the word "kill" is not clear. The
> SIGTERM+SIGKILL keywords would be unambiguous.
>
>
> I'll clarify it - thanks!
>
>
> Regards,
> Júlio.
>
> 2012/3/25 Ralph Castain <rhc_at_[hidden]>
>
>>
>> On Mar 25, 2012, at 7:19 AM, Júlio Hoffimann wrote:
>>
>> Dear Ralph,
>>
>> Thank you for your prompt reply. I confirmed what you just said by
>> reading the mpirun man page at the sections *Signal Propagation* and *Process
>> Termination / Signal Handling*.
>>
>> "During the run of an MPI application, if any rank dies
>> abnormally (either exiting before invoking MPI_FINALIZE, or dying as the
>> result of a signal), mpirun will print out an error message and kill the
>> rest of the MPI application."
>>
>> If i understood correctly, the SIGKILL signal is sent to every process on
>> a premature death.
>>
>>
>> Each process receives a SIGTERM, and then a SIGKILL if it doesn't exit
>> within a specified time frame. I told you how to adjust that time period in
>> the prior message.
>>
>> In my point of view, i consider this a bug. If OpenMPI allows handling
>> signals such as SIGTERM, the other processes in the communicator should
>> also have the opportunity to die prettily. Perhaps i'm missing something?
>>
>>
>> Yes, you are - you do get a SIGTERM first, but you are required to exit
>> in a timely fashion. You are not allowed to continue running. This is
>> required in order to ensure proper cleanup of the job, per the MPI standard.
>>
>>
>> Supposing the described behaviour in the last paragraph, i think would be
>> great to explicitly mention the SIGKILL in the man page, or even better,
>> fix the implementation to send SIGTERM instead, making possible for the
>> user cleanup all processes before exit.
>>
>>
>> We already do, as described above.
>>
>>
>> I solved my particular problem by adding another flag *
>> unexpected_error_on_slave*:
>>
>> volatile sig_atomic_t unexpected_error_occurred = 0;int unexpected_error_on_slave = 0;enum tag { work_tag, die_tag }
>> void my_handler( int sig ){
>> unexpected_error_occurred = 1;}
>> //// somewhere in the code...//
>> signal(SIGTERM, my_handler);
>> if (root process) {
>>
>> // do stuff
>>
>> world.recv(mpi::any_source, die_tag, unexpected_error_on_slave);
>> if ( unexpected_error_occurred || unexpected_error_on_slave ) {
>>
>> // save something
>>
>> world.abort(SIGABRT);
>> }}else { // slave process
>>
>> // do different stuff
>>
>> if ( unexpected_error_occurred ) {
>>
>> // just communicate the problem to the root
>> world.send(root,die_tag,1);
>> signal(SIGTERM,SIG_DFL);
>> while(true)
>> ; // wait, master will take care of this
>> }
>> world.send(root,die_tag,0); // everything is fine}
>> signal(SIGTERM, SIG_DFL); // reassign default handler
>> // continues the code...
>>
>>
>> Note the slave must hang for the store operation get executed at the
>> root, otherwise we back for the previous scenario. It's theoretically
>> unnecessary send MPI messages to accomplish the desired cleanup, and in
>> more complex applications this can turn into a nightmare. As we know,
>> asynchronous events are insane to debug.
>>
>> Best regards,
>> Júlio.
>>
>> P.S.: MPI 1.4.3 from Ubuntu 11.10 repositories.
>>
>> 2012/3/23 Ralph Castain <rhc_at_[hidden]>
>>
>>> Well, yes and no. When a process abnormally terminates, OMPI will kill
>>> the job - this is done by first hitting each process with a SIGTERM,
>>> followed shortly thereafter by a SIGKILL. So you do have a short time on
>>> each process to attempt to cleanup.
>>>
>>> My guess is that your signal handler actually is getting called, but we
>>> then kill the process before you can detect that it was called.
>>>
>>> You might try adjusting the time between sigterm and sigkill using
>>> the odls_base_sigkill_timeout MCA param:
>>>
>>> mpirun -mca odls_base_sigkill_timeout N
>>>
>>> should cause it to wait for N seconds before issuing the sigkill. Not
>>> sure if that will help or not - it used to work for me, but I haven't tried
>>> it for awhile. What versions of OMPI are you using?
>>>
>>>
>>> On Mar 22, 2012, at 4:49 PM, Júlio Hoffimann wrote:
>>>
>>> Dear all,
>>>
>>> I'm trying to handle signals inside a MPI task farming model. Following
>>> is a pseudo-code of what i'm trying to achieve:
>>>
>>> volatile sig_atomic_t unexpected_error_occurred = 0;
>>> void my_handler( int sig ){
>>> unexpected_error_occurred = 1;}
>>> //// somewhere in the code...//
>>> signal(SIGTERM, my_handler);
>>> if (root process) {
>>>
>>> // do stuff
>>>
>>> if ( unexpected_error_occurred ) {
>>>
>>> // save something
>>>
>>> // reraise the SIGTERM again, but now with the default handler
>>> signal(SIGTERM, SIG_DFL);
>>> raise(SIGTERM);
>>> }}else { // slave process
>>>
>>> // do different stuff
>>>
>>> if ( unexpected_error_occurred ) {
>>>
>>> // just propragate the signal to the root
>>> signal(SIGTERM, SIG_DFL);
>>> raise(SIGTERM);
>>> }}
>>> signal(SIGTERM, SIG_DFL); // reassign default handler
>>> // continues the code...
>>>
>>>
>>> As can be seen, the signal handling is required for implementing a
>>> restart feature. All the problem resides in the assumption i made that all
>>> processes in the communicator will receive a SIGTERM as a side effect. Is
>>> it a valid assumption? How the actual MPI implementation deals with such
>>> scenarios?
>>>
>>> I also tried to replace all the raise() calls by MPI_Abort(), which
>>> according to the documentation (
>>> http://www.open-mpi.org/doc/v1.5/man3/MPI_Abort.3.php), sends a SIGTERM
>>> to all associated processes. The undesired behaviour persists: when killing
>>> a slave process, the save section in the root branch is not executed.
>>>
>>> Appreciate any help,
>>> Júlio.
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>