Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] questions about checkpoint/restart on multiple clusters of MPI
From: fengguang tian (fernyabc_at_[hidden])
Date: 2010-03-23 11:39:13


OK,thank you. I will try to move the checkpoint file into the shared
directory

Regards
fengguang

On Tue, Mar 23, 2010 at 10:34 AM, Fernando Lemos <fernandotcl_at_[hidden]>wrote:

> On Tue, Mar 23, 2010 at 12:27 PM, fengguang tian <fernyabc_at_[hidden]>
> wrote:
> > I have created the shared file system. but I created a /mirror at root
> > directory,not at the $HOME directory,is that the
> > problem? thank you
>
> Others might be able to give you more a accurate explanation. The way
> I understood it, in OpenMPI 1.4, you need to write all checkpoints to
> a single, shared location. That's why you generally want a shared file
> system.
>
> Now I'm pretty sure you can change the directory to which the
> checkpoints are written. If you $HOME isn't a shared directory, you
> can point OpenMPI to write the checkpoints to the shared directory
> instead.
>
> In OpenMPI 1.5 (unstable), some magic allows you to create the
> checkpoints and restore them without a shared directory.
>
> Regards,
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>