Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [PATCH] make orte-checkpoint communicate with orterun again
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2014-01-23 12:27:42


+1

On Thu, Jan 23, 2014 at 10:16 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> Looks correct to me - you are right in that you cannot release the buffer
> until after the send completes. We don't copy the data underneath to save
> memory and time.
>
>
> On Jan 23, 2014, at 6:51 AM, Adrian Reber <adrian_at_[hidden]> wrote:
>
> > Following patch makes orte-checkpoint communicate with orterun again:
> >
> > diff --git a/orte/tools/orte-checkpoint/orte-checkpoint.c
> b/orte/tools/orte-checkpoint/orte-checkpoint.c
> > index 7106342..8539f34 100644
> > --- a/orte/tools/orte-checkpoint/orte-checkpoint.c
> > +++ b/orte/tools/orte-checkpoint/orte-checkpoint.c
> > @@ -834,7 +834,7 @@ static int
> notify_process_for_checkpoint(opal_crs_base_ckpt_options_t *options)
> > }
> >
> > if (ORTE_SUCCESS != (ret =
> orte_rml.send_buffer_nb(&(orterun_hnp->name), buffer,
> > -
> ORTE_RML_TAG_CKPT, hnp_receiver,
> > +
> ORTE_RML_TAG_CKPT, orte_rml_send_callback,
> > NULL))) {
> > exit_status = ret;
> > goto cleanup;
> > @@ -845,11 +845,6 @@ static int
> notify_process_for_checkpoint(opal_crs_base_ckpt_options_t *options)
> > ORTE_JOBID_PRINT(jobid));
> >
> > cleanup:
> > - if( NULL != buffer) {
> > - OBJ_RELEASE(buffer);
> > - buffer = NULL;
> > - }
> > -
> > if( ORTE_SUCCESS != exit_status ) {
> > opal_show_help("help-orte-checkpoint.txt", "unable_to_connect",
> true,
> > orte_checkpoint_globals.pid);
> >
> >
> > Before committing the code into the repository I wanted to make
> > sure it is the correct way to fix it.
> >
> > The first change changes the callback to orte_rml_send_callback().
> > When I initially made the code compile again I used hnp_receiver()
> > to change the code from blocking to non-blocking and that was
> > wrong.
> >
> > The second change (removal of OBJ_RELEASE(buffer)) is necessary
> > because this seems to delete buffer during communication and then
> > everything breaks badly.
> >
> > Adrian
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Joshua Hursey
Assistant Professor of Computer Science
University of Wisconsin-La Crosse
http://cs.uwlax.edu/~jjhursey