Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Question about fault tolerance checkpointing
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2008-01-29 13:16:42

Not at the moment.

This would be a neat addition to Open MPI if application developers
see a need for it. There are many issues surrounding this type of a
feature (like any feature). Most of them surround what an application
expects and requires from such an API. One such question is whether an
MPI_Checkpoint function would imply a coordinated global checkpoint
with barrier or a local uncoordinated checkpoint or something else.

The checkpoint/restart framework in Open MPI was designed to allow for
some exposure of the checkpoint/restart routines. So depending on what
you are looking for it may be fairly straight forward to expose a
simple checkpoint/restart API.

I have not heard many requests for such an API, but I may be willing
to help investigate if users are interested.


On Jan 29, 2008, at 11:37 AM, Wong, Wayne wrote:

> Are there plans to provide an API that would allow a fault tolerant
> enabled program to invoke checkpointing directly?
> -Wayne
> _______________________________________________
> users mailing list
> users_at_[hidden]