Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-08-13 09:23:07


FWIW: we fixed this recently in the openib BTL by ensuring that all
registered memory is freed during the BTL finalize (vs. the mpool
finalize).

This is a new issue because the mpool finalize was just recently
expanded to un-register all of its memory as part of the NIC-restart
effort (and will likely also be needed for checkpoint/restart...?).

On Aug 13, 2007, at 9:11 AM, Tim Prins wrote:

> Hi folks,
>
> I have run into a problem with mca_mpool_rdma_finalize as
> implemented in
> r15557. With the t_win onesided test, running over gm, it
> segfaults. What
> appears to be happening is that some memory is registered with gm,
> and then
> gets freed by mca_mpool_rdma_finalize. But the free function that
> it is using
> is in the gm btl, and the btls are unloaded before the mpool is
> shut down. So
> the function call segfaults.
>
> If I change the code so we never unload the btls (and we don't free
> the gm
> port), it works fine.
>
> Note that the openib btl works just fine.
>
> Forgive me if this is a known problem, I am trying to catch up from my
> vacation...
>
> Tim
>
> ---
> If anyone cares, here is the callstack:
> (gdb) bt
> #0 0x404de825 in ?? () from /lib/libgcc_s.so.1
> #1 0x4048081a in mca_mpool_rdma_finalize (mpool=0x925b690)
> at mpool_rdma_module.c:431
> #2 0x400caca9 in mca_mpool_base_close () at base/
> mpool_base_close.c:57
> #3 0x40060094 in ompi_mpi_finalize () at runtime/
> ompi_mpi_finalize.c:304
> #4 0x4009a4c9 in PMPI_Finalize () at pfinalize.c:44
> #5 0x08049946 in main (argc=1, argv=0xbfe16924) at t_win.c:214
> (gdb)
> gdb shows that at this point the gm btl is no longer loaded.
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems