I withdraw my comment on this, it turns out I âmisspokeâ (or in other words I was wrong about the class cleanup). The base class structures are stored as objects in the corresponding shared library memory region, and these regions become unavailable once a shared library is unloaded. As a result we are utterly unable to cleanup the classes at the OPAL layer after the other shared libraries have been unloaded.
Moreover, Nathan was right in his proposal, the only possible cleanup approach is to use the destructor attribute of the OPAL library to cleanup the mess once all libraries are unloaded.
On July 15, 2014 at 1:17:26 AM, George Bosilca (bosilca_at_[hidden]) wrote:
> Fixing the classes to correctly tear down everything was a two lines patch. However,
> this doesnât fix the bigger issue, which is related to the fact that not all frameworks
> are correctly teared down, and when they are they leave behind char* parameters not set
> to NULL, and that the framework infrastructure is not keen of being reinitialized due
> to too many globals not correctly handled.
> If I correctly understand the meaning of the proposed destructor approach, it is only
> called when the library is being unloaded or when the application exit. Thus, adding
> the destructor is a bandaid, addressing a marginal annoyance (partially keeping valgrind
> happy) without addressing the real issue (being able to call MPI_Init after MPI_T_finalize).
> On July 14, 2014 at 6:07:08 PM, Nathan Hjelm (hjelmn_at_[hidden]) wrote:
> > What: Add a library destructor function to OPAL. The new function would
> > take care of cleaning up some of OPAL's state (closing frameworks,
> > shutting down MCA, etc).
> > Why: OPAL can not currently be re-initialized. There are numerous
> > problems throughout the project that will make it difficult (but not
> > impossible) to get opal in a state where we can allow
> > re-initialization. Additionally, there are probably arguments against
> > making opal re-initable.
> > opal not being re-initializable would not normally be a problem except
> > that the following code sequence always crashes:
> > MPI_T_Init_thread (); <-- Calls opal_init_util()
> > MPI_T_Finalize (); <-- Calls opal_finalize_util()
> > MPI_Init (); <-- SEGV
> > This happens because MPI_T_Finalize() calls opal_finalize_util() to
> > ensure maximum valgrind cleanness. This call causes OPAL to tear down
> > OPAL classes (among other things) leading to the SEGV on the next call
> > to opal_init()/opal_init_util(). There is an open ticket on this issue:
> > https://svn.open-mpi.org/trac/ompi/ticket/4490
> > To fix this problem I want to add a destructor function to OPAL. This
> > function would take on some of the current functionality of
> > opal_finalize_util(). This would solve the above issue without having to
> > update OPAL to allow re-initialization.
> > For those not familiar with destructor functions. They are always called
> > at the end of execution or when the library is closed
> > (dl_close). Multiple destructors functions can be defined. Marking a
> > function as a destructor is simple:
> > void __attribute__((destructor)) foo (void);
> > When: Setting a timeout for next Friday (July 25).
> > -Nathan
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15140.php