Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] bug in ompi-restart
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2009-05-19 09:45:37


Currently ompi-restart does not know how to deal with an absolute or
relative path in the command line argument for the global snapshot
handle. It will always prepend the value of the MCA parameter:
snapc_base_global_snapshot_dir
Which defaults to $HOME.

So what you are seeing is (currently) to be expected. If you set the
MCA parameter to the path you are trying for as an argument to ompi-
restart then it should work (something like the below):
ompi-restart -mca snapc_base_global_snapshot_dir $HOME
ompi_global_snapshot_7056.ckpt

I opened a bug to add this capability to orte-restart. You can track
it at the link below:
https://svn.open-mpi.org/trac/ompi/ticket/1924

I am not 100% sure when I will have a chance to get to it, but
hopefully in the next few weeks.

As a side note, if you want to move the global snapshot directory to
another location you will need to update the
'global_snapshot_meta.data' file located at the root of the global
snapshot directory to reflect the path changes for the 'Snapshot
Location:' key.

Cheers,
Josh

On May 14, 2009, at 12:49 PM, Bouguerra mohamed slim wrote:

> Hello,
> I think that there is a problem with the ompi-restart from the
> release r-21197.
> in fact ompi-restart can restart only if the checkpoint directory
> is $HOME.
> For example the checkpoint folder is $HOME.
> if i try ompi-restart -i $HOME/ompi_global_snapshot_7056.ckpt/ it
> doesn't work and i get
>
> msbouguerra_at_sol-5:~$ ompi-restart -i $HOME/
> ompi_global_snapshot_7056.ckpt/
> --------------------------------------------------------------------------
> Error: The filename (/home/grenoble/msbouguerra/
> ompi_global_snapshot_7056.ckpt/) is invalid because either you have
> not provided a filename
> or provided an invalid filename.
> Please see --help for usage.
>
> --------------------------------------------------------------------------
>
>
> and when i try : ompi-restart -i
> ompi_global_snapshot_7056.ckpt/ it works and i get
>
>
> msbouguerra_at_sol-5:~$ ompi-restart -i ompi_global_snapshot_7056.ckpt/
> [sol-5.sophia.grid5000.fr:07466] Sequences: 1
> [sol-5.sophia.grid5000.fr:07466] Seq: 0
> [sol-5.sophia.grid5000.fr:07466] Begin Timestamp: Thu May 14
> 18:23:00 2009
> [sol-5.sophia.grid5000.fr:07466] OPAL CRS Component: blcr
> [sol-5.sophia.grid5000.fr:07466] Snapshot Reference:
> ompi_global_snapshot_7056.ckpt/
> [sol-5.sophia.grid5000.fr:07466] Snapshot Location: /home/grenoble/
> msbouguerra/ompi_global_snapshot_7056.ckpt
> [sol-5.sophia.grid5000.fr:07466] End Timestamp: Thu May 14 18:23:00
> 2009
> [sol-5.sophia.grid5000.fr:07466] Processes: 4
>
> msbouguerra_at_sol-5:~$
>
> So when i use another folder as checkpoint directory the restart
> failed
>
>
> --
> Cordialement,
> Mohamed-Slim BOUGUERRA PhD student INRIA-Grenoble / Projet MOAIS
> ENSIMAG - antenne de Montbonnot
> ZIRST 51, avenue Jean Kuntzmann
> 38330 MONTBONNOT SAINT MARTIN France
> Tel :+33 (0)4 76 61 20 79
> Fax :+33 (0)4 76 61 20 99
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users