Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Pavel Shamis (Pasha) (pasha_at_[hidden])
Date: 2007-08-13 10:00:37

Jeff Squyres wrote:
> FWIW: we fixed this recently in the openib BTL by ensuring that all
> registered memory is freed during the BTL finalize (vs. the mpool
> finalize).
> This is a new issue because the mpool finalize was just recently
> expanded to un-register all of its memory as part of the NIC-restart
> effort (and will likely also be needed for checkpoint/restart...?).
mpool rdma finalize was empty function. I changed it to do the
"finalize" job - go over all registered segments in mpool and release
them one by one,
Mpool use reference counter for each memory region and it prevents us
from double free bug. In openib btl all memory that was registered with
mpool on finalize stage will be unregistered with mpool too.
So maybe in gm the memory (that was registred with mpool) released
directly (not via mpool) and it cause the segfault.


> On Aug 13, 2007, at 9:11 AM, Tim Prins wrote:
>> Hi folks,
>> I have run into a problem with mca_mpool_rdma_finalize as
>> implemented in
>> r15557. With the t_win onesided test, running over gm, it
>> segfaults. What
>> appears to be happening is that some memory is registered with gm,
>> and then
>> gets freed by mca_mpool_rdma_finalize. But the free function that
>> it is using
>> is in the gm btl, and the btls are unloaded before the mpool is
>> shut down. So
>> the function call segfaults.
>> If I change the code so we never unload the btls (and we don't free
>> the gm
>> port), it works fine.
>> Note that the openib btl works just fine.
>> Forgive me if this is a known problem, I am trying to catch up from my
>> vacation...
>> Tim
>> ---
>> If anyone cares, here is the callstack:
>> (gdb) bt
>> #0 0x404de825 in ?? () from /lib/
>> #1 0x4048081a in mca_mpool_rdma_finalize (mpool=0x925b690)
>> at mpool_rdma_module.c:431
>> #2 0x400caca9 in mca_mpool_base_close () at base/
>> mpool_base_close.c:57
>> #3 0x40060094 in ompi_mpi_finalize () at runtime/
>> ompi_mpi_finalize.c:304
>> #4 0x4009a4c9 in PMPI_Finalize () at pfinalize.c:44
>> #5 0x08049946 in main (argc=1, argv=0xbfe16924) at t_win.c:214
>> (gdb)
>> gdb shows that at this point the gm btl is no longer loaded.
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]