On Tue, Feb 18, 2014 at 03:46:58PM +0100, Adrian Reber wrote:
> > >>> I tried to implement something like you described. It is not yet event
> > >>> driven, but before continuing I wanted to get some feedback if it is at
> > >>> least the right start:
> > >>>
> > >>> https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=5048a9cec2cd0bc4867eadfd7e48412b73267706
> > >>>
> > >>> I looked at the other ORTE_OOB_* macros and tried to model my
> > >>> functionality a bit after what I have seen there. Right now it is still
> > >>> a simple function which just tries to call ft_event() on all oob
> > >>> components. Does this look right so far?
> > >>
> > >> Sorry for delay - yes, that looks like the right direction. I would suggest doing it via the current state machine, though, by simply defining another job or proc state in orte/mca/plm/plm_types.h, and then registering a callback function using the orte_state.add_job[proc]_state(state, function to be called, ORTE_ERR_PRI). Then you can activate it by calling ORTE_ACTIVATE_JOB[PROC]_STATE(NULL, state) and it will be handled in the proper order.
> > >
> > > What is a job/proc in the Open MPI context.
> > A "job" is the entire application, while a "proc" is just one process in that application. In this case you could use either one as you are checkpointing the entire job, but all this activity is occurring inside each proc. So I'd suggest defining it as a proc state since it only really involves local actions.
> > If you like, I can define the required code in the trunk and let you fill in the event functionality.
> That would be great.
Thanks for your changes. When using --with-ft there are a few compiler
errors which I tried to fix with following patch: