Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] C/R and orte_oob
From: Adrian Reber (adrian_at_[hidden])
Date: 2014-02-06 17:16:05


Josh explained it to me a few days ago, that after a checkpoint has been
received TCP should no longer be used to not lose any messages. The
communication happens over named pipes and therefore (I think) OOB
ft_event() is used to quite anything besides the pipes. This all seems
to work but I was just confused as the functions for ft_event()
in oob/tcp and oob/ud do not seem to contain any functionality.

So do I try to fix the ft_event() function in oob/base/ to call the
registered ft_event() function which does nothing or do I just remove
the call to orte oob ft_event().

On Thu, Feb 06, 2014 at 10:49:25AM -0800, Ralph Castain wrote:
> The only reason I can think of for an OOB ft-event would be to tell the OOB to stop sending any messages. You would need to push that into the event library and use a callback event to let you know when it was done.
>
> Of course, once you did that, the OOB would no longer be available to, for example, tell the local daemon that the app is ready for checkpoint :-)
>
> Afraid I'll have to defer to Josh H for any further guidance.
>
>
> On Feb 6, 2014, at 8:15 AM, Adrian Reber <adrian_at_[hidden]> wrote:
>
> > When I initially made the C/R code compile again I made following
> > change:
> >
> > diff --git a/orte/mca/rml/oob/rml_oob_component.c b/orte/mca/rml/oob/rml_oob_component.c
> > index f0b22fc..90ed086 100644
> > --- a/orte/mca/rml/oob/rml_oob_component.c
> > +++ b/orte/mca/rml/oob/rml_oob_component.c
> > @@ -185,8 +185,7 @@ orte_rml_oob_ft_event(int state) {
> > ;
> > }
> >
> > - if( ORTE_SUCCESS !=
> > - (ret = orte_oob.ft_event(state)) ) {
> > + if( ORTE_SUCCESS != (ret = orte_rml_oob_ft_event(state)) ) {
> > ORTE_ERROR_LOG(ret);
> > exit_status = ret;
> > goto cleanup;
> >
> >
> >
> > This is, of course, wrong. Now the function calls itself in a loop until
> > it crashes. Looking at orte/mca/oob there is still a ft_event()
> > function, but it is disabled using "#if 0". Looking at other functions
> > it seems I would need to create something like
> >
> > #define ORTE_OOB_FT_EVENT(m)
> >
> > Looking at the modules in orte/mca/oob/ it seems ft_event is implemented
> > in some places but it never seems to have any real functionality. Is
> > ft_event() actually needed there?
> >
> > Adrian
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel