Hi,
I had compiled and installed Open MPI with C/R support in the way Josh said.
When finished, Open MPI had support and tools for C/R: ompi-checkpoint,
ompi-restart.
And I try an example ( hello_c.c in examples folder, but I edit it with a
for loop to print out "Hello..." 1,000,000 times)
But I get this error:
Error: The application (PID = 23573) failed to checkpoint properly.
Returned -1.
The steps what I had do:
# mpicc hello_c.c -o hello
# mpirun -np 4 -am ft-enable-cr hello
I get PID of this mpirun with another shell and do:
# ompi-checkpoint 23573
Error: The application (PID = 23573) failed to checkpoint
properly.
Returned -1.
What's wrong with this error?
Could you help me an example about using C/R in Open MPI?
Hiep
hello_c.c
#include <stdio.h>
#include "mpi.h"
int main(int argc, char* argv[])
{
int rank, size, i;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
for(i=0; i<1000000; i++){
printf("%d Hello, world, I am %d of %d\n",i,rank, size);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
On 8/22/07, Josh Hursey <jjhursey_at_[hidden]> wrote:
>
> Hello,
>
> There are a few things you need to do to build Open MPI with
> Checkpoint/Restart support. By default Open MPI is configured without
> checkpoint/restart support.
> 1) Make sure you have BLCR successfully installed and loaded on your
> system(s)
> 2) configure Open MPI with the "--with-ft=cr" option, which enables
> checkpoint/restart fault tolerance
> Note: you may also have to specify the install directory of BLCR
> with the "--with-blcr=/path/to/blcr"
> 3) make and make install
>
> The resultant build will have support for checkpoint/restart and the
> tools (e.g., ompi-checkpoint, ompi-restart) will become available.
>
> Looking at the documentation it doesn't seem to include these steps.
> I'll fix that later today, and post and updated file to the wiki.
> Sorry about that. :(
>
> Hope this helps,
> Josh
>
> On Aug 21, 2007, at 1:09 PM, Hiep Bui Hoang wrote:
>
> > Hello,
> > I'm Hiep, I'm trying to use checkpoint/restart feature in Open MPI.
> > I had read information about this feature in https://svn.open-
> > mpi.org/trac/ompi/wiki/ProcessFT_CR and Open-MPI-FT-CR-Draft-
> > v1.pdf. I had built Open MPI from "trunk" which gotten by Subversion.
> > But I don't know how to enable checkpoint/restart fault tolerance
> > in Open MPI.
> > So that, I get this error when I try this command: ompi-checkpoint.
> > bash: ompi-checkpoint: command not found
> > I want to ask you how to build and use checkpoint/restart feature
> > in Open MPI.
> > Please tell me in details, I'm a new user.
> > Thanks!
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
|