Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] questions about checkpoint/restart on multiple clusters of MPI
From: Fernando Lemos (fernandotcl_at_[hidden])
Date: 2010-03-23 11:34:34


On Tue, Mar 23, 2010 at 12:27 PM, fengguang tian <fernyabc_at_[hidden]> wrote:
> I have created the shared file system. but I created a /mirror at root
> directory,not at the $HOME directory,is that the
> problem? thank you

Others might be able to give you more a accurate explanation. The way
I understood it, in OpenMPI 1.4, you need to write all checkpoints to
a single, shared location. That's why you generally want a shared file
system.

Now I'm pretty sure you can change the directory to which the
checkpoints are written. If you $HOME isn't a shared directory, you
can point OpenMPI to write the checkpoints to the shared directory
instead.

In OpenMPI 1.5 (unstable), some magic allows you to create the
checkpoints and restore them without a shared directory.

Regards,