Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] openib unloaded before last mem dereg
From: Steve Wise (swise_at_[hidden])
Date: 2013-01-28 12:12:56


On 1/25/2013 12:19 PM, Steve Wise wrote:
> Hello,
>
> I'm tracking an issue I see in openmpi-1.6.3. Running this command on
> my chelsio iwarp/rdma setup causes a seg fault every time:
>
> /usr/mpi/gcc/openmpi-1.6.3-dbg/bin/mpirun --np 2 --host
> hpc-hn1,hpc-cn2 --mca btl openib,sm,self --mca
> btl_openib_ipaddr_include "192.168.170.0/24"
> /usr/mpi/gcc/openmpi-1.6.3/tests/IMB-3.2/IMB-MPI1 pingpong
>
> The segfault is during finalization, and I've debugged this to the
> point were I see a call to dereg_mem() after the openib blt is
> unloaded via dlclose(). dereg_mem() dereferences a function pointer
> to call the btl-specific dereg function, in this case it is
> openib_dereg_mr(). However, since that btl has already been unloaded,
> the deref causes a seg fault. Happens every time with the above mpi job.
>
> Now, I tried this same experiment with openmpi-1.7rc6 and I don't see
> the seg fault, and I don't see a call to dereg_mem() after the openib
> btl is unloaded. That's all well good. :) But I'd like to get this
> fix pushed into 1.6 since that is the current stable release.
>
> Question: Can someone point me to the fix in 1.7?
>
> Thanks,
>
> Steve.

It appears that in ompi_mpi_finalize(), mca_pml_base_close() is called
which unloads the openib btl. Then further down in ompi_mpi_finalize(),
mca_mpool_base_close() is called which ends up calling dereg_mem() which
seg faults trying to call into the unloaded openib btl.

Anybody have thoughts? Anybody care? :)

Steve.