Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Memory allocation error when linking with MPI libraries
From: Nicolas Deladerriere (nicolas.deladerriere_at_[hidden])
Date: 2010-08-12 09:00:55


building openmpi with option "--without-memory-manager" fix my problem.

What does it exactly imply to compile with this option ?
I guess all malloc use functions from libc instead of openmpi one, but does
it have an effect on performance or something else ?

Nicolas

2010/8/8 Nysal Jan <jnysal_at_[hidden]>

> What interconnect are you using? Infiniband? Use
> "--without-memory-manager" option while building ompi in order to disable
> ptmalloc.
>
> Regards
> --Nysal
>
>
> On Sun, Aug 8, 2010 at 7:49 PM, Nicolas Deladerriere <
> nicolas.deladerriere_at_[hidden]> wrote:
>
>> Yes, I'am using 24G machine on 64 bit Linux OS.
>> If I compile without wrapper, I did not get any problems.
>>
>> It seems that when I am linking with openmpi, my program use a kind of
>> openmpi implemented malloc. Is it possible to switch it off in order ot only
>> use malloc from libc ?
>>
>> Nicolas
>>
>> 2010/8/8 Terry Frankcombe <terry_at_[hidden]>
>>
>> You're trying to do a 6GB allocate. Can your underlying system handle
>>> that? IF you compile without the wrapper, does it work?
>>>
>>> I see your executable is using the OMPI memory stuff. IIRC there are
>>> switches to turn that off.
>>>
>>>
>>> On Fri, 2010-08-06 at 15:05 +0200, Nicolas Deladerriere wrote:
>>> > Hello,
>>> >
>>> > I'am having an sigsegv error when using simple program compiled and
>>> > link with openmpi.
>>> > I have reproduce the problem using really simple fortran code. It
>>> > actually does not even use MPI, but just link with mpi shared
>>> > libraries. (problem does not appear when I do not link with mpi
>>> > libraries)
>>> > % cat allocate.F90
>>> > program test
>>> > implicit none
>>> > integer, dimension(:), allocatable :: z
>>> > integer(kind=8) :: l
>>> >
>>> > write(*,*) "l ?"
>>> > read(*,*) l
>>> >
>>> > ALLOCATE(z(l))
>>> > z(1) = 111
>>> > z(l) = 222
>>> > DEALLOCATE(z)
>>> >
>>> > end program test
>>> >
>>> > I am using openmpi 1.4.2 and gfortran for my tests. Here is the
>>> > compilation :
>>> >
>>> > % ./openmpi-1.4.2/build/bin/mpif90 --showme -g -o testallocate
>>> > allocate.F90
>>> > gfortran -g -o testallocate allocate.F90
>>> > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/include -pthread
>>> > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib
>>> > -L/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib -lmpi_f90
>>> > -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl
>>> > -lutil -lm -ldl -pthread
>>> >
>>> > When I am running that test with different length, I sometimes get a
>>> > "Segmentation fault" error. Here are two examples using two specific
>>> > values, but error happens for many other values of length (I did not
>>> > manage to find which values of lenght gives that error)
>>> >
>>> > % ./testallocate
>>> > l ?
>>> > 1600000000
>>> > Segmentation fault
>>> > % ./testallocate
>>> > l ?
>>> > 2000000000
>>> >
>>> > I used debugger with re-compiled version of openmpi using debug flag.
>>> > I got the folowing error in function sYSMALLOc
>>> >
>>> > Program received signal SIGSEGV, Segmentation fault.
>>> > 0x00002aaaab70b3b3 in sYSMALLOc (nb=6400000016, av=0x2aaaab930200)
>>> > at malloc.c:3239
>>> > 3239 set_head(remainder, remainder_size | PREV_INUSE);
>>> > Current language: auto; currently c
>>> > (gdb) bt
>>> > #0 0x00002aaaab70b3b3 in sYSMALLOc (nb=6400000016,
>>> > av=0x2aaaab930200) at malloc.c:3239
>>> > #1 0x00002aaaab70d0db in opal_memory_ptmalloc2_int_malloc
>>> > (av=0x2aaaab930200, bytes=6400000000) at malloc.c:4322
>>> > #2 0x00002aaaab70b773 in opal_memory_ptmalloc2_malloc
>>> > (bytes=6400000000) at malloc.c:3435
>>> > #3 0x00002aaaab70a665 in opal_memory_ptmalloc2_malloc_hook
>>> > (sz=6400000000, caller=0x2aaaabf8534d) at hooks.c:667
>>> > #4 0x00002aaaabf8534d in _gfortran_internal_free ()
>>> > from /usr/lib64/libgfortran.so.1
>>> > #5 0x0000000000400bcc in MAIN__ () at allocate.F90:11
>>> > #6 0x0000000000400c4e in main ()
>>> > (gdb) display
>>> > (gdb) list
>>> > 3234 if ((unsigned long)(size) >= (unsigned long)(nb +
>>> > MINSIZE)) {
>>> > 3235 remainder_size = size - nb;
>>> > 3236 remainder = chunk_at_offset(p, nb);
>>> > 3237 av->top = remainder;
>>> > 3238 set_head(p, nb | PREV_INUSE | (av != &main_arena ?
>>> > NON_MAIN_ARENA : 0));
>>> > 3239 set_head(remainder, remainder_size | PREV_INUSE);
>>> > 3240 check_malloced_chunk(av, p, nb);
>>> > 3241 return chunk2mem(p);
>>> > 3242 }
>>> > 3243
>>> >
>>> >
>>> > I also did the same test in C and I got the same problem.
>>> >
>>> > Does someone has any idea that could help me understand what's going
>>> > on ?
>>> >
>>> > Regards
>>> > Nicolas
>>> >
>>> > _______________________________________________
>>> > users mailing list
>>> > users_at_[hidden]
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>