Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Use unique collective ids for the checkpoint/restart code
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-02-03 15:42:39


Looks okay to me - I see you left a "printf" statement in plm_base_launch_support.c, so you might want to make that an opal_output_verbose or something.

On Feb 3, 2014, at 12:19 PM, Adrian Reber <adrian_at_[hidden]> wrote:

> This patch
>
> https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=14ec7f42baab882e345948ff79c4f75f5084bbbf
>
> introduces unique collective ids for the checkpoint/restart code and
> with this applied it seems to work pretty good. As this patch also
> touches non-CR code it would be good if someone could have a look at it.
>
> With this patch applied the code seems to work up to the point where
> orterun actually pauses all processes and tries to create the
> checkpoints. The checkpoint creation does not work for me as CRS does
> not yet include support for checkpoint/restart using CRIU which would be
> my next step.
>
> Adrian
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel