Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI checkpoint/restart
From: Joshua Hursey (jjhursey_at_[hidden])
Date: 2010-01-14 09:33:17

On Jan 14, 2010, at 2:50 AM, Andreea Costea wrote:

> Hei there
> I have some questions regarding checkpoint/restart:
> 1. Until recently I thought that ompi-restart and ompi-restart are used to checkpoint a process inside an MPI application. Now I reread this and I realized that actually what it does is to checkpoint the mpirun process. Does this mean that if I run my application with multiple processes and on multiple nodes in my network the checkpoint file will contain the states of all the processes of my MPI application?

I think you slightly misread the entry. ompi-checkpoint checkpoints the entire MPI application, across node boundaries. It requires that the user pass the PID of mpirun to server as a reference point for the command. This way a user can run multiple mpiruns from the same machine and only checkpoint a subset of those.

> 2. Can I restart the application on a different node?

Yes. If you have trouble doing this, then I would suggest following the directions in the BLCR FAQ entry below (it usually addressed 99% of the problems people have doing this):

-- Josh

> Thanks a lot,
> Andreea
> _______________________________________________
> users mailing list
> users_at_[hidden]