Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] checkpointing multi node and multi process applications
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2010-01-11 16:42:32


On Dec 19, 2009, at 7:42 AM, Jean Potsam wrote:

> Hi Everyone,
> I am trying to checkpoint an mpi application
> running on multiple nodes. However, I get some error messages when i
> trigger the checkpointing process.
>
> Error: expected_component: PID information unavailable!
> Error: expected_component: Component Name information unavailable!
>
> I am using open mpi 1.3 and blcr 0.8.1

Can you try the v1.4 release and see if the problem persists?

>
> I execute my application as follows:
>
> mpirun -am ft-enable-cr -np 3 --hostfile hosts gol.
>
> My question:
>
> Does openmpi with blcr support checkpointing of multi node execution
> of mpi application? If so, can you provide me with some information
> on how to achieve this.

Open MPI is able to checkpoint a multi-node application (that's what
it was designed to do). There are some examples at the link below:
   http://www.osl.iu.edu/research/ft/ompi-cr/examples.php

-- Josh

>
> Cheers,
>
> Jean.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users