Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Checkpoint an MPI process
From: Lloyd Brown (lloyd_brown_at_[hidden])
Date: 2012-01-19 10:18:33

Since you're looking for a function call, I'm going to assume that you
are writing this application, and it's not a pre-compiled, commercial
application. Given that, it's going to be significantly better to have
an internal application checkpointing mechanism, where it serializes and
stores the data, etc., than to use an external, applicaiton-agnostic
checkpointing mechanism like BLCR or similar. The application should be
aware of what data is important, how to most efficiently store it, etc.
 A generic library has to assume that everything is important, and store
it all.

Don't get me wrong. Libraries like BLCR are great for applications that
don't have that visibility, and even as a tool for the
application-internal checkpointing mechanism (where the application
deliberately interacts with the library to annotate what's important to
store, and how to do so, etc.). But if you're writing the application,
you're better off to handle it internally, than externally.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University

On 01/19/2012 08:05 AM, Josh Hursey wrote:
> Currently Open MPI only supports the checkpointing of the whole
> application. There has been some work on uncoordinated checkpointing
> with message logging, though I do not know the state of that work with
> regards to availability. That work has been undertaken by the University
> of Tennessee Knoxville, so maybe they can provide more information.
> -- Josh
> On Wed, Jan 18, 2012 at 3:24 PM, Rodrigo Oliveira
> <rsilva.oliveira_at_[hidden] <mailto:rsilva.oliveira_at_[hidden]>> wrote:
> Hi,
> I'd like to know if there is a way to checkpoint a specific process
> running under an mpirun call. In other words, is there a function
> CHECKPOINT(rank) in which I can pass the rank of the process I want
> to checkpoint? I do not want to checkpoint the entire application,
> but just one of its processes.
> Thanks
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> --
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> _______________________________________________
> users mailing list
> users_at_[hidden]