Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] questions about checkpoint/restart on multiple clusters of MPI
From: Fernando Lemos (fernandotcl_at_[hidden])
Date: 2010-03-23 11:34:34

On Tue, Mar 23, 2010 at 12:27 PM, fengguang tian <fernyabc_at_[hidden]> wrote:
> I have created the shared file system. but I created a /mirror at root
> directory,not at the $HOME directory,is that the
> problem? thank you

Others might be able to give you more a accurate explanation. The way
I understood it, in OpenMPI 1.4, you need to write all checkpoints to
a single, shared location. That's why you generally want a shared file

Now I'm pretty sure you can change the directory to which the
checkpoints are written. If you $HOME isn't a shared directory, you
can point OpenMPI to write the checkpoints to the shared directory

In OpenMPI 1.5 (unstable), some magic allows you to create the
checkpoints and restore them without a shared directory.