Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] bug in mca framework?
From: Mike Dubman (miked_at_[hidden])
Date: 2013-12-03 13:38:56


thanks
what magic "-mca base_verbose" param should print it?

On Tue, Dec 3, 2013 at 6:59 PM, Nathan Hjelm <hjelmn_at_[hidden]> wrote:

> This usually happens when a string that belongs to the MCA system is freed
> elsewhere. Can you find out the name of the variable that is being
> destructed
> in frame 2.
>
> -Nathan Hjelm
> Application Readiness, HPC-5, LANL
>
> On Tue, Dec 03, 2013 at 02:53:29PM +0200, Mike Dubman wrote:
> > Hi,
> > We observe crash during shmem_finalize() (in trunk) with new MCA
> > framework.
> > After investigation, found that MCA tears-down process can access
> > previously released memory. (reproduced with oshmem_hello_c.c test)
> > 0 0x00007fffed3d51d0 in ?? ()
> > #1 <signal handler called>
> > #2 0x00007ffff710e21e in var_destructor (var=0x6fa7e0) at
> > mca_base_var.c:1605
> > #3 0x00007ffff710ae99 in opal_obj_run_destructors (object=0x6fa7e0) at
> > ../../../opal/class/opal_object.h:448
> > #4 0x00007ffff710ca18 in mca_base_var_finalize () at
> mca_base_var.c:954
> > #5 0x00007ffff710a7e2 in mca_base_param_finalize () at
> > mca_base_param.c:643
> > #6 0x00007ffff70e08e2 in opal_finalize_util () at
> > runtime/opal_finalize.c:77
> > #7 0x00007ffff7aa5319 in ompi_mpi_finalize () at
> > runtime/ompi_mpi_finalize.c:407
> > #8 0x00007ffff7d900cc in oshmem_shmem_finalize () at
> > runtime/oshmem_shmem_finalize.c:75
> > #9 0x00007ffff7d91119 in shmem_finalize () at shmem_finalize.c:24
> > #10 0x00007ffff7d89b8f in __do_global_dtors_aux () from
> > /install/lib/libshmem.so.0
> > #11 0x0000000000000000 in ?? ()
> > The crash can be resolved by following patch:
> > diff --git a/opal/mca/base/mca_base_var.c
> b/opal/mca/base/mca_base_var.c
> > index 9966627..48028d8 100644
> > --- a/opal/mca/base/mca_base_var.c
> > +++ b/opal/mca/base/mca_base_var.c
> > @@ -773,7 +773,7 @@ static int var_find_by_name (const char
> *full_name,
> > int *index, bool invalidok)
> >
> > (void) var_get ((int)(uintptr_t) tmp, &var, false);
> >
> > - if (invalidok || VAR_IS_VALID(var[0])) {
> > + if (VAR_IS_VALID(var[0])) {
> > *index = (int)(uintptr_t) tmp;
> > return OPAL_SUCCESS;
> > }
> > I`m not sure we understand yet why it fixes the problem and what is a
> > race.
> > Could some` with knowledge of MCA flows look at it and comment?
> > The "invalidok" was introduced by Jeff`s commit.
> > Thanks
> > M
>
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>