Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres \(jsquyres\) (jsquyres_at_[hidden])
Date: 2006-05-23 06:36:44


Neil --

Many thanks for the detailed report!

Our Memory Hooks Guy(tm) (Brian Barrett) is inprocessing at his summer
internship over the next 24-48 hours; this may delay the reply a little
bit.

> -----Original Message-----
> From: devel-bounces_at_[hidden]
> [mailto:devel-bounces_at_[hidden]] On Behalf Of Neil Ludban
> Sent: Monday, May 22, 2006 6:36 PM
> To: devel_at_[hidden]
> Subject: [OMPI devel] memory_malloc_hooks.c and dlclose()
>
> Hello,
>
> I'm getting a core dump when using openmpi-1.0.2 with the MPI
> extensions
> we're developing for the MATLAB interpreter. This same build
> of openmpi
> is working great with C programs and our extensions for gnu
> octave. The
> machine is AMD64 running Linux:
>
> Linux kodos 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:29:47 EST 2005
> x86_64 x86_64 x86_64 GNU/Linux
>
> I believe there's a bug in that opal_memory_malloc_hooks_init() links
> itself into the __free_hook chain during initialization, but then it
> never unlinks itself at shutdown. In the interpreter environment,
> libopal.so is dlclose()d and unmapped from memory long before the
> interpreter is done with dynamic memory. A quick check of the nightly
> trunk snapshot reveals some function name changes, but no new shutdown
> code.
>
> After running this trivial MPI program on a single processor:
> MPI_Init();
> MPI_Finalize();
> I'm back to the MATLAB prompt, and break into the debugger:
>
> >>> ^C
> (gdb) info sharedlibrary
> >From To Syms Read Shared
> Object Library
> ...
> 0x0000002aa0b50740 0x0000002aa0b50a28 Yes
> .../mexMPI_Init.mexa64
> 0x0000002aa0c52a50 0x0000002aa0c54318 Yes
> .../lib/libbcmpi.so.0
> 0x0000002aa0dcef90 0x0000002aa0e37398 Yes
> /usr/lib64/libstdc++.so.6
> 0x0000002aa0fa9ec0 0x0000002aa102e118 Yes
> .../lib/libmpi.so.0
> 0x0000002aa1178560 0x0000002aa11af708 Yes
> .../lib/liborte.so.0
> 0x0000002aa12cffb0 0x0000002aa12f2988 Yes
> .../lib/libopal.so.0
> 0x0000002aa1424180 0x0000002aa14249d8 Yes
> /lib64/libutil.so.1
> 0x0000002aa152a760 0x0000002aa1536368 Yes /lib64/libnsl.so.1
> 0x0000002aa3540b80 0x0000002aa3551077 Yes
> /usr/local/ibgd-1.8.0/driver/infinihost/lib64/libvapi.so
> 0x0000002aa365e0a0 0x0000002aa3664a86 Yes
> /usr/local/ibgd-1.8.0/driver/infinihost/lib64/libmosal.so
> 0x0000002aa470db50 0x0000002aa4719438 Yes
> /usr/local/ibgd-1.8.0/driver/infinihost/lib64/librhhul.so
> 0x0000002ac4e508c0 0x0000002ac4e50ed8 Yes
> .../mexMPI_Constants.mexa64
> 0x0000002ac4f52740 0x0000002ac4f52a28 Yes
> .../mexMPI_Finalize.mexa64
>
> (gdb) c
> >> exit
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 182992729024 (LWP 21848)]
> opal_mem_free_free_hook (ptr=0x7fbfff96d0, caller=0xa8d4f8)
> at memory_malloc_hooks.c:65
>
> (gdb) info sharedlibrary
> >From To Syms Read Shared
> Object Library
> ...
> 0x0000002aa1424180 0x0000002aa14249d8 Yes
> /lib64/libutil.so.1
> 0x0000002aa152a760 0x0000002aa1536368 Yes /lib64/libnsl.so.1
> 0x0000002aa3540b80 0x0000002aa3551077 Yes
> /usr/local/ibgd-1.8.0/driver/infinihost/lib64/libvapi.so
> 0x0000002aa365e0a0 0x0000002aa3664a86 Yes
> /usr/local/ibgd-1.8.0/driver/infinihost/lib64/libmosal.so
> 0x0000002aa470db50 0x0000002aa4719438 Yes
> /usr/local/ibgd-1.8.0/driver/infinihost/lib64/librhhul.so
>
> (gdb) list
> 63 static void
> 64 opal_mem_free_free_hook (void *ptr, const void *caller)
> 65 {
> 66 /* dispatch about the pending free */
> 67 opal_mem_free_release_hook(ptr, malloc_usable_size(ptr));
> 68
> 69 __free_hook = old_free_hook;
> 70
> 71 /* call the next chain down */
> 72 free(ptr);
> 73
> 74 /* save the hooks again and restore our hook again */
>
> (gdb) print ptr
> $2 = (void *) 0x7fbfff96d0
> (gdb) print caller
> $3 = (const void *) 0xa8d4f8
> (gdb) print __free_hook
> $4 = (void (*)(void *, const void *)) 0x2aa12f1d79
> <opal_mem_free_free_hook>
> (gdb) print old_free_hook
> Cannot access memory at address 0x2aa1422800
>
>
> Before I start blindly hacking a workaround, can somebody
> who's familiar
> with the openmpi internals verify that this is a real bug, suggest a
> correct fix, and/or comment on other potential problems with
> running in
> an interpreter.
>
> Thanks-
>
> -Neil
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>