Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Question about fault tolerance checkpointing
From: Leonardo Fialho (leonardofialho_at_[hidden])
Date: 2008-01-29 13:24:51


Josh,

At this moment I´m working in the uncoordinated checkpoint, and
probably I´ll have some tools to collect data from the process and
environment and probably from the application.

About the application I´m considering the possibility to do something
like this (MPI_Checkpoint??).

Leonardo Fialho

2008/1/29, Josh Hursey <jjhursey_at_[hidden]>:
> Not at the moment.
>
> This would be a neat addition to Open MPI if application developers
> see a need for it. There are many issues surrounding this type of a
> feature (like any feature). Most of them surround what an application
> expects and requires from such an API. One such question is whether an
> MPI_Checkpoint function would imply a coordinated global checkpoint
> with barrier or a local uncoordinated checkpoint or something else.
>
> The checkpoint/restart framework in Open MPI was designed to allow for
> some exposure of the checkpoint/restart routines. So depending on what
> you are looking for it may be fairly straight forward to expose a
> simple checkpoint/restart API.
>
> I have not heard many requests for such an API, but I may be willing
> to help investigate if users are interested.
>
> Cheers,
> Josh
>
> On Jan 29, 2008, at 11:37 AM, Wong, Wayne wrote:
>
> > Are there plans to provide an API that would allow a fault tolerant
> > enabled program to invoke checkpointing directly?
> >
> > -Wayne
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>