Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-02-16 20:13:19


r20569 fixes the problem, but I'm not 100% sure it's the Right Way.

Short version: now that we're guaranteeing to free the event base,
we're exercising a code path that was never used before. Apparently
the orted initializes the ev->timebase min_heap_t structure, but then
never uses it. So the pointer to the array of events in the heap is
still NULL when we get to the destructor. Previously, the destructor
just unconditionally freed the array. I put in a NULL check, which
avoids the problem.

But it begs the question -- why is that data structure being
initialized/freed if we're never using it? Is it something inherent
in libevent?

On Feb 16, 2009, at 7:49 PM, Jeff Squyres (jsquyres) wrote:

> Unfortunately, this doesn't fully fix the problem -- I'm still getting
> bad frees:
>
> [16:47] svbu-mpi:~/mpi % ./hello
> stdout: Hello, world! I am 0 of 1 (svbu-mpi.cisco.com)
> stderr: Hello, world! I am 0 of 1 (svbu-mpi.cisco.com)
> malloc debug: Invalid free (min_heap.h, 58)
>
> [16:48] svbu-mpi:~/mpi % mpirun -np 1 hello
> [svbu-mpi001:27549] ********** Parsing receive_queues
> stdout: Hello, world! I am 0 of 1 (svbu-mpi001)
> stderr: Hello, world! I am 0 of 1 (svbu-mpi001)
> malloc debug: Invalid free (min_heap.h, 58)
>
>
> On Feb 16, 2009, at 7:20 PM, bosilca_at_[hidden] wrote:
>
> > Author: bosilca
> > Date: 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009)
> > New Revision: 20568
> > URL: https://svn.open-mpi.org/trac/ompi/changeset/20568
> >
> > Log:
> > Make sure we correctly unregister all persistent events
> > and signal handlers.
> >
> > Text files modified:
> > trunk/orte/orted/orted_main.c | 8 ++++++++
> > trunk/orte/runtime/orte_wait.c | 4 ++--
> > 2 files changed, 10 insertions(+), 2 deletions(-)
> >
> > Modified: trunk/orte/orted/orted_main.c
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> >
> ======================================================================
> > --- trunk/orte/orted/orted_main.c (original)
> > +++ trunk/orte/orted/orted_main.c 2009-02-16 19:20:05 EST
> (Mon, 16
> > Feb 2009)
> > @@ -754,6 +754,14 @@
> > exit(ORTE_ERROR_DEFAULT_EXIT_CODE);
> > }
> >
> > + /* Release all local signal handlers */
> > + opal_event_del(&term_handler);
> > + opal_event_del(&int_handler);
> > +#ifndef __WINDOWS__
> > + opal_signal_del(&sigusr1_handler);
> > + opal_signal_del(&sigusr2_handler);
> > +#endif /* __WINDOWS__ */
> > +
> > /* Finalize and clean up ourselves */
> > ret = orte_finalize();
> > exit(ret);
> >
> > Modified: trunk/orte/runtime/orte_wait.c
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> >
> ======================================================================
> > --- trunk/orte/runtime/orte_wait.c (original)
> > +++ trunk/orte/runtime/orte_wait.c 2009-02-16 19:20:05 EST
> (Mon, 16
> > Feb 2009)
> > @@ -517,8 +517,8 @@
> > /* define the event to fire when someone writes to the pipe */
> > opal_event_set(*event, p[0], OPAL_EV_READ, cbfunc, NULL);
> >
> > - /* Add it to the active events, without a timeout */
> > - opal_event_add(*event, NULL);
> > + /* Add it to the active events, without a timeout */
> > + opal_event_add(*event, NULL);
> >
> > /* all done */
> > return ORTE_SUCCESS;
> > _______________________________________________
> > svn-full mailing list
> > svn-full_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems