Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] checkpointing multi node and multi process applications
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2010-01-11 16:42:32

On Dec 19, 2009, at 7:42 AM, Jean Potsam wrote:

> Hi Everyone,
> I am trying to checkpoint an mpi application
> running on multiple nodes. However, I get some error messages when i
> trigger the checkpointing process.
> Error: expected_component: PID information unavailable!
> Error: expected_component: Component Name information unavailable!
> I am using open mpi 1.3 and blcr 0.8.1

Can you try the v1.4 release and see if the problem persists?

> I execute my application as follows:
> mpirun -am ft-enable-cr -np 3 --hostfile hosts gol.
> My question:
> Does openmpi with blcr support checkpointing of multi node execution
> of mpi application? If so, can you provide me with some information
> on how to achieve this.

Open MPI is able to checkpoint a multi-node application (that's what
it was designed to do). There are some examples at the link below:

-- Josh

> Cheers,
> Jean.
> _______________________________________________
> users mailing list
> users_at_[hidden]