Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI checkpoint/restart on multiple nodes
From: Andreea Costea (andre.costea_at_[hidden])
Date: 2010-02-08 08:35:48

I asked this question because checkpointing with to NFS is successful, but
checkpointing without a mount filesystem or a shared storage throws this

WARNING: Could not preload specified file: File already exists.
Fileset: /home/andreea/checkpoints/global/ompi_global_snapshot_7426.ckpt/0
Host: X

Will continue attempting to launch the process.

filem:rsh: wait_all(): Wait failed (-1)
[[62871,0],0] ORTE_ERROR_LOG: Error in file snapc_full_global.c at line 1054

even if I set the mca-parameters like this:


and the nodes can connect through ssh without a password.


On Mon, Feb 8, 2010 at 12:59 PM, Andreea Costea <andre.costea_at_[hidden]>wrote:

> Hi,
> Let's say I have an MPI application running on several hosts. Is there any
> way to checkpoint this application without having a shared storage between
> the nodes?
> I already took a look at the examples here
>, but it seems that
> in both cases there is a globally mounted file system.
> Thanks,
> Andreea