Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] OPEN-MPI Fault-Tolerance for GASNet
From: Thomas CI Yoon (workciyoon_at_[hidden])
Date: 2009-11-22 19:13:13


Dear all.

Thanks to developers of OPEN-MPI for Fault-Tolerance, I can use the checkpoint/restart function very well for my MPI applications.
But its checkpoint does not work for my GASNet applications which use the MPI conduit.
Is here anyone else to help me?

I wrote some code with GASNet API (Global-Address Space Networking: http://gasnet.cs.berkeley.edu/) and used MPI conduit for my gasnet application, so my program ran well with open-mpirun. Thus I thought that I could also use the transparent checkpoint/restart function supported by BLCR in Open-mpi. As opposed to my idea, it does not work and show the following error message.
--------------------------------------------------------------------------
Error: The process with PID 13896 is not checkpointable.
       This could be due to one of the following:
        - An application with this PID doesn't currently exist
        - The application with this PID isn't checkpointable
        - The application with this PID isn't an OPAL application.
       We were looking for the named files:
         /tmp/opal_cr_prog_write.13896
         /tmp/opal_cr_prog_read.13896
--------------------------------------------------------------------------
1 more process has sent help message help-opal-checkpoint.txt
Set MCA parameter "orte_base_help_aggregate" to 0 to see all help
 0] 13896) Step 53
 0] 15100) Step 53
 0] 13896) Step 54
 0] 15100) Step 54
 0] 13896) Step 55

In my application, the MPI_Initialized() says it is initialized.

Thank you for your reading and have a great day.