Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Fault Tolerant Features in OpenMPI
From: Edson Tavares de Camargo (etcamargo_at_[hidden])
Date: 2013-08-11 09:33:29


Thanks a lot for your reply, Ralph!

Could you tell me in what situation the error handler would be called in
the 1.6.5 version?

I had thought that a failure in a process would be catched by the error
handler. Kill, or abort, the process wouldn't the same behaviour?

In the 1.7.4 release if a process was killed the error handler will be
catched?

Thanks,

Edson
---------------------

> The error handler wouldn't be called in that situation - we simply abort
> the job. We expect to provide that integration in something like the 1.7.4
> release milestone.
>
>
> On Aug 10, 2013, at 11:07 AM, Edson Tavares de Camargo
> <etcamargo_at_[hidden]> wrote:
>
>> Hi All,
>>
>> I was looking for posts about fault tolerant in MPI and I found the post
>> below:
>>
>> http://www.open-mpi.org/community/lists/users/2012/06/19658.php
>>
>> I am trying to understand all work about failures detection present in
>> open-mpi. So, I began with a simple application, a ring application
>> (ring.c) , to understand errors handlers. But, it seems me that didn't
>> work, why not? (the code is below)
>>
>> The application (the process) was running in the same machine with the
>> following code line:
>>
>> $ mpiexec -n 4 ring
>>
>> While the ring application was running, one of the process was killed.
>> So, the entire application stopped (ok until here), but didn't show me
>> the
>> error message. The line if(error != MPI_SUCCESS) should not worked?
>>
>> I am using the mpiexec (OpenRTE) 1.6.5.
>>
>> Thanks in advance,
>>
>> Edson
>>
>> -----------------------------------------------
>> #include <stdio.h>
>> #include <mpi.h>
>> #include <time.h>
>>
>> int main( int argc, char *argv[] )
>> {
>> int rank, size;
>> int n = 0;
>> int tag = 0;
>> int error;
>> int root = 0;
>> int next, previous;
>> double start = 0;
>> double finish = 0;
>>
>> MPI_Status status;
>>
>> MPI_Init( &argc, &argv );
>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>
>> // error handler
>> MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>>
>> do {
>> next = (rank + 1) % (size);
>> n++;
>>
>> if(rank != 0){
>> previous = (rank - 1);
>> }else{
>> previous = size - 1;
>> }
>>
>> if (rank =