On Fri, Feb 14, 2014 at 02:51:51PM -0800, Ralph Castain wrote:
On Feb 13, 2014, at 11:26 AM, Adrian Reber <firstname.lastname@example.org> wrote:
I tried to implement something like you described. It is not yet event
driven, but before continuing I wanted to get some feedback if it is at
least the right start:
I looked at the other ORTE_OOB_* macros and tried to model my
functionality a bit after what I have seen there. Right now it is still
a simple function which just tries to call ft_event() on all oob
components. Does this look right so far?
Sorry for delay - yes, that looks like the right direction. I would suggest doing it via the current state machine, though, by simply defining another job or proc state in orte/mca/plm/plm_types.h, and then registering a callback function using the orte_state.add_job[proc]_state(state, function to be called, ORTE_ERR_PRI). Then you can activate it by calling ORTE_ACTIVATE_JOB[PROC]_STATE(NULL, state) and it will be handled in the proper order.
What is a job/proc in the Open MPI context.
A "job" is the entire application, while a "proc" is just one process in that application. In this case you could use either one as you are checkpointing the entire job, but all this activity is occurring inside each proc. So I'd suggest defining it as a proc state since it only really involves local actions.
If you like, I can define the required code in the trunk and let you fill in the event functionality.