Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] bug in mca framework?
From: Mike Dubman (miked_at_[hidden])
Date: 2013-12-03 13:38:56


thanks
what magic "-mca base_verbose" param should print it?

On Tue, Dec 3, 2013 at 6:59 PM, Nathan Hjelm <hjelmn_at_[hidden]> wrote:

> This usually happens when a string that belongs to the MCA system is freed
> elsewhere. Can you find out the name of the variable that is being
> destructed
> in frame 2.
>
> -Nathan Hjelm
> Application Readiness, HPC-5, LANL
>
> On Tue, Dec 03, 2013 at 02:53:29PM +0200, Mike Dubman wrote:
> > Hi,
> > We observe crash during shmem_finalize() (in trunk) with new MCA
> > framework.
> > After investigation, found that MCA tears-down process can access
> > previously released memory. (reproduced with oshmem_hello_c.c test)
> > 0 0x00007fffed3d51d0 in ?? ()
> > #1 <signal handler called>
> > #2 0x00007ffff710e21e in var_destructor (var=0x6fa7e0) at
> > mca_base_var.c:1605
> > #3 0x00007ffff710ae99 in opal_obj_run_destructors (object=0x6fa7e0) at
> > ../../../opal/class/opal_object.h:448
> > #4 0x00007ffff710ca18 in mca_base_var_finalize () at
> mca_base_var.c:954
> > #5 0x00007ffff710a7e2 in mca_base_param_finalize () at
> > mca_base_param.c:643
> > #6 0x00007ffff70e08e2 in opal_finalize_util () at
> > runtime/opal_finalize.c:77
> > #7 0x00007ffff7aa5319 in ompi_mpi_finalize () at
> > runtime/ompi_mpi_finalize.c:407
> > #8 0x00007ffff7d900cc in oshmem_shmem_finalize () at
> > runtime/oshmem_shmem_finalize.c:75
> > #9 0x00007ffff7d91119 in shmem_finalize () at shmem_finalize.c:24
> > #10 0x00007ffff7d89b8f in __do_global_dtors_aux () from
> > /install/lib/libshmem.so.0
> > #11 0x0000000000000000 in ?? ()
> > The crash can be resolved by following patch:
> > diff --git a/opal/mca/base/mca_base_var.c
> b/opal/mca/base/mca_base_var.c
> > index 9966627..48028d8 100644
> > --- a/opal/mca/base/mca_base_var.c
> > +++ b/opal/mca/base/mca_base_var.c
> > @@ -773,7 +773,7 @@ static int var_find_by_name (const char
> *full_name,
> > int *index, bool invalidok)
> >
> > (void) var_get ((int)(uintptr_t) tmp, &var, false);
> >
> > - if (invalidok || VAR_IS_VALID(var[0])) {
> > + if (VAR_IS_VALID(var[0])) {
> > *index = (int)(uintptr_t) tmp;
> > return OPAL_SUCCESS;
> > }
> > I`m not sure we understand yet why it fixes the problem and what is a
> > race.
> > Could some` with knowledge of MCA flows look at it and comment?
> > The "invalidok" was introduced by Jeff`s commit.
> > Thanks
> > M
>
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>