Hi Everyone,
I am trying to checkpoint an mpi application running on multiple nodes. However, I get some error messages when i trigger the checkpointing process.
Error: expected_component: PID information unavailable!
Error: expected_component: Component Name information unavailable!
I am using open mpi 1.3 and blcr 0.8.1
I execute my application as follows:
mpirun -am ft-enable-cr -np 3 --hostfile hosts gol.
My question:
Does openmpi with blcr support checkpointing of multi node execution of mpi application? If so, can you provide me with some information on how to achieve this.
Cheers,
Jean.
|