Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MALLOC_MMAP_MAX (and MALLOC_MMAP_THRESHOLD)
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-01-11 11:52:17


Arrgh -- if only the Linux kernel community had accepted ummunotify, this would now be a moot point (i.e., the argument would be solely with the OS/glibc, not the MPI!).

On Jan 9, 2010, at 10:45 PM, Barrett, Brian W wrote:

> We should absolutely not change this. For simple applications, yes, things work if large blocks are allocated on the heap. However, ptmalloc (and most allocators, really), can't rationally cope with repeated allocations and deallocations of large blocks. It would be *really bad* (as we've seen before) to change the behavior of our version of ptmalloc from that which is provided by Linux. Pain and suffering is all that path has ever lead to.
>
> Just my $0.02, of course.
>
> Brian
>
> ________________________________________
> From: devel-bounces_at_[hidden] [devel-bounces_at_[hidden]] On Behalf Of Eugene Loh [Eugene.Loh_at_[hidden]]
> Sent: Saturday, January 09, 2010 9:55 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] MALLOC_MMAP_MAX (and MALLOC_MMAP_THRESHOLD)
>
> Jeff Squyres wrote:
>
> >I'm not sure I follow -- are you saying that Open MPI is disabling the large mmap allocations, and we shouldn't?
> >
> >
> Basically the reverse. The default (I think this means Linux, whether
> with gcc, gfortran, Sun f90, etc.) is to use mmap to malloc large
> allocations. We don't change this, but arguably we should.
>
> Try this:
>
> #include <stdlib.h>
> #include <stdio.h>
>
> int main(int argc, char **argv) {
> size_t size, nextsize;
> void *ptr, *nextptr;
>
> size = 1;
> ptr = malloc(size);
> while ( size < 1000000 ) {
> nextsize = 1.1 * size + 1;
> nextptr = malloc(nextsize);
> printf("%9ld %18lx %18lx %18lx\n", size, size, nextptr - ptr, ptr);
> size = nextsize;
> ptr = nextptr ;
> }
>
> return 0;
> }
>
> Here is sample output:
>
> # bytes #bytes (hex) #bytes ptr (hex)
> to next ptr
> (hex)
>
> 58279 e3a7 e3b0 58f870
> 64107 fa6b fa80 59dc20
> 70518 11376 11380 5ad6a0
> 77570 12f02 12f10 5bea20
> 85328 14d50 14d60 5d1930
> 93861 16ea5 16eb0 5e6690
> 103248 19350 19360 5fd540
> 113573 1bba5 1bbb0 6168a0
> 124931 1e803 2b3044655bc0 632450
> 137425 218d1 22000 2b3044c88010
> 151168 24e80 25000 2b3044caa010
> 166285 2898d 29000 2b3044ccf010
> 182914 2ca82 2d000 2b3044cf8010
> 201206 311f6 294000 2b3044d25010
> 221327 3608f 37000 2b3044fb9010
> 243460 3b704 3c000 2b3044ff0010
>
> So, below 128K allocations, pointers are allocated at successively
> higher addresses, each one just barely far enough to make room for the
> allocation. E.g., an allocation of 0xE3A7 will push the "high-water
> mark" up 0xE3B0 further.
>
> Beyond 128K allocations, allocations are page aligned. The pointers all
> end in 0x010. That is, whole numbers of pages are allocated and the
> returned address is 16 bytes (0x10) into the first page. The size of
> the allocations are the requested amount, plus a few bytes of padding,
> rounded up to the nearest whole page size multiple.
>
> The motivation to change, in my case, is performance. I don't know how
> widespread this problem is, but...
>
> >On Jan 8, 2010, at 9:25 AM, Sylvain Jeaugey wrote:
> >
> >
> >>On Thu, 7 Jan 2010, Eugene Loh wrote:
> >>
> >>>setenv MALLOC_MMAP_MAX_ 0
> >>>setenv MALLOC_TRIM_THRESHOLD_ -1
> >>>
> >>>
> >>But yes, this set of settings is the number one tweak on HPC code that I'm
> >>aware of.
> >>
> >>
> Wow! I might vote for "compiling with -O", but let's not pick nits here.
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Jeff Squyres
jsquyres_at_[hidden]