Jeff Squyres wrote:
> Background: Pasha added a call in the openib BTL finalize function
> that will only succeed if all registered memory has been released
> (ibv_dealloc_pd()). Since the test app didn't call MPI_FREE_MEM,
> there was some memory that was still registered, and therefore the
> call in finalize failed. We treated this as a fatal error. Last
> night's MTT runs turned up several apps that exhibited this fatal error.
> While we're examining this problem, Pasha has removed the call to
> ibv_dealloc_pd() in the trunk openib BTL finalize.
> I examined 1 of the tests that was failing last night in MTT:
> onesided/t.f90. This test has an MPI_ALLOC_MEM with no corresponding
> MPI_FREE_MEM. To investigate this problem, I restored the call to
> ibv_dealloc_pd() and re-ran the t.f90 test -- the problem still
> occurs. Good.
> However, once I got the right MPI_FREE_MEM call in t.f90, the test
> started passing. I.e., ibv_dealloc_pd(hca->ib_pd) succeeds because
> all registered memory has been released. Hence, the test itself was
> However, I don't think we should *error* if we fail to ibv_dealloc_pd
> (hca->ib_pd); it's a user error, but it's not catastrophic unless
> we're trying to do an HCA restart scenario. Specifically: during a
> normal MPI_FINALIZE, who cares?
> I think we should do the following:
> 1. If we're not doing an HCA restart/checkpoint and we fail to
> ibv_dealloc_pd(), just move on (i.e., it's not a warning/error unless
> we *want* a warning, such as if an MCA parameter
> btl_openib_warn_if_finalize_fail is enabled, or somesuch).
> 2. If we *are* doing an HCA restart/checkpoint and ibv_dealloc_pd()
> fails, then we have to gracefully fail to notify upper layers that
> Bad Things happened (I suspect that we need mpool finalize
> implemented to properly implement checkpointing for RDMA networks).
> 3. Add a new MCA parameter named mpi_show_mpi_alloc_mem_leaks that,
> when enabled, shows a warning in ompi_mpi_finalize() if there is
> still memory allocated by MPI_ALLOC_MEM that was not freed by
> MPI_FREE_MEM (this MCA parameter will parallel the already-existing
> mpi_show_handle_leaks MCA param which displays warnings if the app
> creates MPI objects but does not free them).
> My points:
> - leaked MPI_ALLOC_MEM memory should be reported by the MPI layer,
> not a BTL or mpool
> - failing to ibv_dealloc_pd() during MPI_FINALIZE should only trigger
> a warning if the user wants to see it
> - failing to ibv_dealloc_pd() during an HCA restart or checkpoint
> should gracefully fail upwards
In addition I will add code that will flush all user data from mpool and
will allow normal IB finalization.