Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Checkpointing an MPI application with OMPI
From: Constantinos Makassikis (cmakassikis_at_[hidden])
Date: 2013-01-30 07:53:42

On Wed, Jan 30, 2013 at 3:02 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> If your node hardware is the problem, or you decide you do want/need to
> pursue an FT solution, then you might look at the OMPI-based solutions from
> parties such as or the MPICH2 folks.

Just as Ralph said, you may look into alternatives. From what I have seen,
MPICH2 provides fault tolerance using BLCR.
The same goes for Intel's MPI ( Though not free, you
may try it during
a 30-day evaluation period (
It can be interesting to see how the two MPI fair wrt to BLCR-based FT.

Another alternative which may be worth considering is DMTCP ( from Northeastern University
for which there has been an interesting podcast recently (

Finally, depending on the application, you may be interested in adding
checkpoint-based fault tolerance at the application level with the help of
libraries such as SCR ( Though
you'll need to spend some time modifying the application source code,
it may be better than system-level based alternatives in the long run.