Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] questions about checkpoint/restart on multiple clusters of MPI
From: fengguang tian (fernyabc_at_[hidden])
Date: 2010-03-23 11:39:13

OK,thank you. I will try to move the checkpoint file into the shared


On Tue, Mar 23, 2010 at 10:34 AM, Fernando Lemos <fernandotcl_at_[hidden]>wrote:

> On Tue, Mar 23, 2010 at 12:27 PM, fengguang tian <fernyabc_at_[hidden]>
> wrote:
> > I have created the shared file system. but I created a /mirror at root
> > directory,not at the $HOME directory,is that the
> > problem? thank you
> Others might be able to give you more a accurate explanation. The way
> I understood it, in OpenMPI 1.4, you need to write all checkpoints to
> a single, shared location. That's why you generally want a shared file
> system.
> Now I'm pretty sure you can change the directory to which the
> checkpoints are written. If you $HOME isn't a shared directory, you
> can point OpenMPI to write the checkpoints to the shared directory
> instead.
> In OpenMPI 1.5 (unstable), some magic allows you to create the
> checkpoints and restore them without a shared directory.
> Regards,
> _______________________________________________
> users mailing list
> users_at_[hidden]