Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] mpiexec option for node failure
From: Rob Stewart (robstewart57_at_[hidden])
Date: 2011-09-12 19:52:50


Hi,

I have implemented a simple fault tolerant ping pong C program with MPI,
here: http://pastebin.com/7mtmQH2q

MPICH2 offers a parameter with mpiexec:
$ mpiexec -disable-auto-cleanup

.. as described here: http://trac.mcs.anl.gov/projects/mpich2/ticket/1421

It is fault tolerant in the respect that, when I ssh to one of the nodes
in the hosts file, and kill the relevant process, the MPI job is not
terminated. Simply, the ping will not prompt a pong from the dead node,
but the ping-pong runs forever on the remaining live nodes.

Is such an feature available for openMPI, either via mpiexec or some
other means?

-- 
Rob Stewart