Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI Checkpoint/Restart is failed
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2010-05-18 14:46:20


(Sorry for the delay in replying, more below)

On Apr 12, 2010, at 6:36 AM, Hideyuki Jitsumoto wrote:

> Hi Members,
>
> I tried to use checkpoint/restart by openmpi.
> But I can not get collect checkpoint data.
> I prepared execution environment as follows, the strings in () mean
> name of output file which attached on next e-mail ( for mail size
> limitation ):
>
> 1. installed BLCR and checked BLCR is working correctly by "make
> check"
> 2. executed ./configure with some parameters on openMPI source dir
> (config.output / config.log)
> 3. executed make and make install (make.output.2 / install.output.2)
> 4. confirmed that mca_crs_blcr.[la|so], mca_crs_self.[la|so] on
> /${INSTALL_DIR}/lib/openmpi
> 5. make ~/.openmpi/mca-params.conf (mca-params.conf)
> 6. compiled NPB and executed with -am ft-enable-cr
> 7. invoked ompi-checkpoint <MPIRUN_PID>
>
> As result, I got the message "Checkpoint failed: no processes
> checkpointed."
> (cr_test_cg)

It is unclear from the output what caused the checkpoint to fail. Can
you turn on some verbose arguments and send me the output?

Put the following options in you ~/.openmpi/mca-params.conf:
#---------------
orte_debug_daemons=1
snapc_full_verbose=20
crs_base_verbose=10
opal_cr_verbose=10
#---------------

>
> In addition, when I confirmed open_info output as your demo movie, I
> got
> "MCA crs: none (MCA v2.0, API v2.0, Component
> v1.4.1)" (open_info.output)

This is actually a known bug with ompi_info. I have a fix in the works
for it, and should be available soon. Until then the ticket is linked
below:
   https://svn.open-mpi.org/trac/ompi/ticket/2097

>
> How should I do for checkpointing ?
> Any guidance in this regard would be highly appreciated.

Let's see what the verbose output tells us, and go from there. What
version of BLCR are you using?

-- Josh

>
> Thank you,
> Hideyuki
>
> --
> Sincerely Yours,
> Hideyuki Jitsumoto (jitumoto_at_[hidden])
> Tokyo Institute of Technology
> Global Scientific Information and Computing center (Matsuoka Lab.)
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users