Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Don't crash on node failures
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-04-14 09:58:58


Yes - followed a few microseconds later with a SIGKILL if it didn't terminate. The daemon exits shortly thereafter, and if the proc is -still- somehow alive, it kills itself once it sees the daemon is gone.

On Apr 14, 2010, at 7:29 AM, Jürgen Kaiser wrote:

> What happens exactly when a job or node crashes? Does orte send a
> SIGTERM to each process?
>
> Best regards,
> Jürgen
>
> Durga Choudhury wrote:
>> This would be a very welcoming new feature for me as well. My two
>> thumbs up when it happens.
>>
>> Best regards
>> Durga
>>
>>
>> On Tue, Apr 13, 2010 at 10:28 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>>> Not right now, but coming later this year...
>>>
>>> On Apr 13, 2010, at 7:21 AM, Jürgen Kaiser wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> Can I force MPI to not abort the whole job when a node crashes? I would
>>>> like to let the remaining MPI-processes perform some action in that case
>>>> and then proceed.
>>>>
>>>> Thanks,
>>>> Jürgen
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users