Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] OpenMPI Checkpoint/Restart components
From: Andreea Costea (andre.costea_at_[hidden])
Date: 2009-12-06 02:23:07

Hi there

Lately I've been reading lots of papers about fault tolerance for MPI
applications. All seemed very nice and clear. But as soon as I pass the
reading part to start testing I had my surprise as there I can not find
implementations. The best I could find is the possibility of manually
checkpoint and restart the application. No checkpoint protocol, no
checkpoint manager, no recovery protocol.
Can you please help and point me to a user transparent fault tolerance
implementation for MPI applications?

Thanks a lot,