The Open MPI community did consider such as option, but it deemed it as uninteresting. However, we (UTK team) have a patched version supporting several fault tolerant modes, including the one you described in your email. If you are interested please contact me directly.
On Sep 12, 2011, at 20:43 , Ralph Castain wrote:
> We don't have anything similar in OMPI. There are fault tolerance modes, but not like the one you describe.
> On Sep 12, 2011, at 5:52 PM, Rob Stewart wrote:
>> I have implemented a simple fault tolerant ping pong C program with MPI, here: http://pastebin.com/7mtmQH2q
>> MPICH2 offers a parameter with mpiexec:
>> $ mpiexec -disable-auto-cleanup
>> .. as described here: http://trac.mcs.anl.gov/projects/mpich2/ticket/1421
>> It is fault tolerant in the respect that, when I ssh to one of the nodes in the hosts file, and kill the relevant process, the MPI job is not terminated. Simply, the ping will not prompt a pong from the dead node, but the ping-pong runs forever on the remaining live nodes.
>> Is such an feature available for openMPI, either via mpiexec or some other means?
>> Rob Stewart
>> users mailing list
> users mailing list