Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Fault Tolerant Features in OpenMPI
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-08-10 23:55:49


The error handler wouldn't be called in that situation - we simply abort the job. We expect to provide that integration in something like the 1.7.4 release milestone.

On Aug 10, 2013, at 11:07 AM, Edson Tavares de Camargo <etcamargo_at_[hidden]> wrote:

> Hi All,
>
> I was looking for posts about fault tolerant in MPI and I found the post
> below:
>
> http://www.open-mpi.org/community/lists/users/2012/06/19658.php
>
> I am trying to understand all work about failures detection present in
> open-mpi. So, I began with a simple application, a ring application
> (ring.c) , to understand errors handlers. But, it seems me that didn't
> work, why not? (the code is below)
>
> The application (the process) was running in the same machine with the
> following code line:
>
> $ mpiexec -n 4 ring
>
> While the ring application was running, one of the process was killed.
> So, the entire application stopped (ok until here), but didn't show me the
> error message. The line if(error != MPI_SUCCESS) should not worked?
>
> I am using the mpiexec (OpenRTE) 1.6.5.
>
> Thanks in advance,
>
> Edson
>
> -----------------------------------------------
> #include <stdio.h>
> #include <mpi.h>
> #include <time.h>
>
> int main( int argc, char *argv[] )
> {
> int rank, size;
> int n = 0;
> int tag = 0;
> int error;
> int root = 0;
> int next, previous;
> double start = 0;
> double finish = 0;
>
> MPI_Status status;
>
> MPI_Init( &argc, &argv );
> MPI_Comm_size(MPI_COMM_WORLD, &size);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>
> // error handler
> MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>
> do {
> next = (rank + 1) % (size);
> n++;
>
> if(rank != 0){
> previous = (rank - 1);
> }else{
> previous = size - 1;
> }
>
> if (rank == root) {
>
> error = MPI_Send( &n, 1, MPI_INT, next, tag, MPI_COMM_WORLD );
>
> //if a error happens print the message
> if(error != MPI_SUCCESS){
> printf("error");
> }
>
> error = MPI_Recv( &n, 1, MPI_INT, previous, tag,
> MPI_COMM_WORLD, &status );
>
> //if a error happens print the message
> if(error != MPI_SUCCESS){
> printf("error");
> }
> }
> else {
>
> error = MPI_Recv( &n, 1, MPI_INT, previous, tag,
> MPI_COMM_WORLD, &status );
>
> //if a error happens print the message
> if(error != MPI_SUCCESS){
> printf("error");
> }
>
> error = MPI_Send( &n, 1, MPI_INT, next, tag, MPI_COMM_WORLD );
>
> //if a error happens print the message
> if(error != MPI_SUCCESS){
> printf("error");
> }
> }
> printf( "Process %d got %d\n", rank, n );
>
> // wait a bit
> start = MPI_Wtime();
> finish = start;
>
> while ( (finish - start) < 1 ){
> finish = MPI_Wtime();
> }
>
> } while (n < 100);
>
> MPI_Finalize();
> return 0;
> }
> ----------------------------
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users