On Dec 19, 2009, at 7:42 AM, Jean Potsam wrote:
> Hi Everyone,
> I am trying to checkpoint an mpi application
> running on multiple nodes. However, I get some error messages when i
> trigger the checkpointing process.
> Error: expected_component: PID information unavailable!
> Error: expected_component: Component Name information unavailable!
> I am using open mpi 1.3 and blcr 0.8.1
Can you try the v1.4 release and see if the problem persists?
> I execute my application as follows:
> mpirun -am ft-enable-cr -np 3 --hostfile hosts gol.
> My question:
> Does openmpi with blcr support checkpointing of multi node execution
> of mpi application? If so, can you provide me with some information
> on how to achieve this.
Open MPI is able to checkpoint a multi-node application (that's what
it was designed to do). There are some examples at the link below:
> users mailing list