Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Memory allocation error when linking with MPI libraries
From: Nicolas Deladerriere (nicolas.deladerriere_at_[hidden])
Date: 2010-08-31 08:36:11


Hi,

Thanks Nysal for these details.

I also fixed my memory allocation issue using environment variable
OMPI_MCA_memory_ptmalloc2_disable which is much more easier (at least in my
case) than compiled new openmpi package and install that new package.
The point is that it is a bit complicated to have information about this
variable (seems to be a secret variable !). Actually I have read that it
cannot be used as normal MCA parameter and cannot be set in configuration
file ( http://www.open-mpi.org/community/lists/users/2010/06/13208.php ).

When using this variable, I have added -x OMPI_MCA_memory_ptmalloc2_disable
option to my mpirun command line. Do I really have to do it ?
Is the environment variable (plus -x option if required) is still the only
solution to set this parameter to 1 ?

Regards,
Nicolas

2010/8/15 Nysal Jan <jnysal_at_[hidden]>

> >What does it exactly imply to compile with this option ?
> Open MPI's internal malloc library (ptmalloc) will not be built/used. If
> you are using an RDMA capable interconnect such as Infiniband, you will not
> be able to use the "mpi_leave_pinned" feature. mpi_leave_pinned might
> improve performance for applications that reuse/repeatedly send from the
> same buffer. If you are not using such interconnects then there is no impact
> on performance. For more details see the FAQ entries (24-28) -
> http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned
>
> --Nysal
>
>
>
> On Thu, Aug 12, 2010 at 6:30 PM, Nicolas Deladerriere <
> nicolas.deladerriere_at_[hidden]> wrote:
>
>> building openmpi with option "--without-memory-manager" fix my problem.
>>
>> What does it exactly imply to compile with this option ?
>> I guess all malloc use functions from libc instead of openmpi one, but
>> does it have an effect on performance or something else ?
>>
>> Nicolas
>>
>> 2010/8/8 Nysal Jan <jnysal_at_[hidden]>
>>
>> What interconnect are you using? Infiniband? Use
>>> "--without-memory-manager" option while building ompi in order to disable
>>> ptmalloc.
>>>
>>> Regards
>>> --Nysal
>>>
>>>
>>> On Sun, Aug 8, 2010 at 7:49 PM, Nicolas Deladerriere <
>>> nicolas.deladerriere_at_[hidden]> wrote:
>>>
>>>> Yes, I'am using 24G machine on 64 bit Linux OS.
>>>> If I compile without wrapper, I did not get any problems.
>>>>
>>>> It seems that when I am linking with openmpi, my program use a kind of
>>>> openmpi implemented malloc. Is it possible to switch it off in order ot only
>>>> use malloc from libc ?
>>>>
>>>> Nicolas
>>>>
>>>> 2010/8/8 Terry Frankcombe <terry_at_[hidden]>
>>>>
>>>> You're trying to do a 6GB allocate. Can your underlying system handle
>>>>> that? IF you compile without the wrapper, does it work?
>>>>>
>>>>> I see your executable is using the OMPI memory stuff. IIRC there are
>>>>> switches to turn that off.
>>>>>
>>>>>
>>>>> On Fri, 2010-08-06 at 15:05 +0200, Nicolas Deladerriere wrote:
>>>>> > Hello,
>>>>> >
>>>>> > I'am having an sigsegv error when using simple program compiled and
>>>>> > link with openmpi.
>>>>> > I have reproduce the problem using really simple fortran code. It
>>>>> > actually does not even use MPI, but just link with mpi shared
>>>>> > libraries. (problem does not appear when I do not link with mpi
>>>>> > libraries)
>>>>> > % cat allocate.F90
>>>>> > program test
>>>>> > implicit none
>>>>> > integer, dimension(:), allocatable :: z
>>>>> > integer(kind=8) :: l
>>>>> >
>>>>> > write(*,*) "l ?"
>>>>> > read(*,*) l
>>>>> >
>>>>> > ALLOCATE(z(l))
>>>>> > z(1) = 111
>>>>> > z(l) = 222
>>>>> > DEALLOCATE(z)
>>>>> >
>>>>> > end program test
>>>>> >
>>>>> > I am using openmpi 1.4.2 and gfortran for my tests. Here is the
>>>>> > compilation :
>>>>> >
>>>>> > % ./openmpi-1.4.2/build/bin/mpif90 --showme -g -o testallocate
>>>>> > allocate.F90
>>>>> > gfortran -g -o testallocate allocate.F90
>>>>> > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/include -pthread
>>>>> > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib
>>>>> > -L/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib -lmpi_f90
>>>>> > -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl
>>>>> > -lutil -lm -ldl -pthread
>>>>> >
>>>>> > When I am running that test with different length, I sometimes get a
>>>>> > "Segmentation fault" error. Here are two examples using two specific
>>>>> > values, but error happens for many other values of length (I did not
>>>>> > manage to find which values of lenght gives that error)
>>>>> >
>>>>> > % ./testallocate
>>>>> > l ?
>>>>> > 1600000000
>>>>> > Segmentation fault
>>>>> > % ./testallocate
>>>>> > l ?
>>>>> > 2000000000
>>>>> >
>>>>> > I used debugger with re-compiled version of openmpi using debug flag.
>>>>> > I got the folowing error in function sYSMALLOc
>>>>> >
>>>>> > Program received signal SIGSEGV, Segmentation fault.
>>>>> > 0x00002aaaab70b3b3 in sYSMALLOc (nb=6400000016, av=0x2aaaab930200)
>>>>> > at malloc.c:3239
>>>>> > 3239 set_head(remainder, remainder_size | PREV_INUSE);
>>>>> > Current language: auto; currently c
>>>>> > (gdb) bt
>>>>> > #0 0x00002aaaab70b3b3 in sYSMALLOc (nb=6400000016,
>>>>> > av=0x2aaaab930200) at malloc.c:3239
>>>>> > #1 0x00002aaaab70d0db in opal_memory_ptmalloc2_int_malloc
>>>>> > (av=0x2aaaab930200, bytes=6400000000) at malloc.c:4322
>>>>> > #2 0x00002aaaab70b773 in opal_memory_ptmalloc2_malloc
>>>>> > (bytes=6400000000) at malloc.c:3435
>>>>> > #3 0x00002aaaab70a665 in opal_memory_ptmalloc2_malloc_hook
>>>>> > (sz=6400000000, caller=0x2aaaabf8534d) at hooks.c:667
>>>>> > #4 0x00002aaaabf8534d in _gfortran_internal_free ()
>>>>> > from /usr/lib64/libgfortran.so.1
>>>>> > #5 0x0000000000400bcc in MAIN__ () at allocate.F90:11
>>>>> > #6 0x0000000000400c4e in main ()
>>>>> > (gdb) display
>>>>> > (gdb) list
>>>>> > 3234 if ((unsigned long)(size) >= (unsigned long)(nb +
>>>>> > MINSIZE)) {
>>>>> > 3235 remainder_size = size - nb;
>>>>> > 3236 remainder = chunk_at_offset(p, nb);
>>>>> > 3237 av->top = remainder;
>>>>> > 3238 set_head(p, nb | PREV_INUSE | (av != &main_arena ?
>>>>> > NON_MAIN_ARENA : 0));
>>>>> > 3239 set_head(remainder, remainder_size | PREV_INUSE);
>>>>> > 3240 check_malloced_chunk(av, p, nb);
>>>>> > 3241 return chunk2mem(p);
>>>>> > 3242 }
>>>>> > 3243
>>>>> >
>>>>> >
>>>>> > I also did the same test in C and I got the same problem.
>>>>> >
>>>>> > Does someone has any idea that could help me understand what's going
>>>>> > on ?
>>>>> >
>>>>> > Regards
>>>>> > Nicolas
>>>>> >
>>>>> > _______________________________________________
>>>>> > users mailing list
>>>>> > users_at_[hidden]
>>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>