Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Fault Tolerant Features in OpenMPI
From: Edson Tavares de Camargo (etcamargo_at_[hidden])
Date: 2013-08-10 14:07:33


Hi All,

I was looking for posts about fault tolerant in MPI and I found the post
below:

http://www.open-mpi.org/community/lists/users/2012/06/19658.php

I am trying to understand all work about failures detection present in
open-mpi. So, I began with a simple application, a ring application
(ring.c) , to understand errors handlers. But, it seems me that didn't
work, why not? (the code is below)

The application (the process) was running in the same machine with the
following code line:

$ mpiexec -n 4 ring

While the ring application was running, one of the process was killed.
So, the entire application stopped (ok until here), but didn't show me the
error message. The line if(error != MPI_SUCCESS) should not worked?

I am using the mpiexec (OpenRTE) 1.6.5.

Thanks in advance,

Edson

-----------------------------------------------
#include <stdio.h>
#include <mpi.h>
#include <time.h>

int main( int argc, char *argv[] )
{
    int rank, size;
    int n = 0;
    int tag = 0;
    int error;
    int root = 0;
    int next, previous;
    double start = 0;
    double finish = 0;

    MPI_Status status;

    MPI_Init( &argc, &argv );
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    // error handler
    MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);

    do {
        next = (rank + 1) % (size);
        n++;

        if(rank != 0){
            previous = (rank - 1);
        }else{
            previous = size - 1;
        }

        if (rank == root) {

            error = MPI_Send( &n, 1, MPI_INT, next, tag, MPI_COMM_WORLD );

            //if a error happens print the message
            if(error != MPI_SUCCESS){
                printf("error");
            }

            error = MPI_Recv( &n, 1, MPI_INT, previous, tag,
MPI_COMM_WORLD, &status );

            //if a error happens print the message
            if(error != MPI_SUCCESS){
                printf("error");
            }
        }
        else {

            error = MPI_Recv( &n, 1, MPI_INT, previous, tag,
MPI_COMM_WORLD, &status );

            //if a error happens print the message
            if(error != MPI_SUCCESS){
                printf("error");
            }

            error = MPI_Send( &n, 1, MPI_INT, next, tag, MPI_COMM_WORLD );

            //if a error happens print the message
            if(error != MPI_SUCCESS){
                printf("error");
            }
        }
        printf( "Process %d got %d\n", rank, n );

        // wait a bit
        start = MPI_Wtime();
        finish = start;

        while ( (finish - start) < 1 ){
            finish = MPI_Wtime();
        }

    } while (n < 100);

    MPI_Finalize();
    return 0;
}
----------------------------