Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MALLOC_MMAP_MAX (and MALLOC_MMAP_THRESHOLD)
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-01-09 06:33:41


I'm not sure I follow -- are you saying that Open MPI is disabling the large mmap allocations, and we shouldn't?

On Jan 8, 2010, at 9:25 AM, Sylvain Jeaugey wrote:

> On Thu, 7 Jan 2010, Eugene Loh wrote:
>
> > Could someone tell me how these settings are used in OMPI or give any
> > guidance on how they should or should not be used?
> This is a very good question :-) As this whole e-mail, though it's hard
> (in my opinion) to give it a Good (TM) answer.
>
> > This means that if you loop over the elements of multiple large arrays
> > (which is common in HPC), you can generate a lot of cache conflicts,
> > depending on the cache associativity.
> On the other hand, high buffer alignment sometimes gives better
> performance (e.g. Infiniband QDR bandwidth).
>
> > There are multiple reasons one might want to modify the behavior of the
> > memory allocator, including high cost of mmap calls, wanting to register
> > memory for faster communications, and now this cache-conflict issue. The
> > usual solution is
> >
> > setenv MALLOC_MMAP_MAX_ 0
> > setenv MALLOC_TRIM_THRESHOLD_ -1
> >
> > or the equivalent mallopt() calls.
> But yes, this set of settings is the number one tweak on HPC code that I'm
> aware of.
>
> > This issue becomes an MPI issue for at least three reasons:
> >
> > *) MPI may care about these settings due to memory registration and pinning.
> > (I invite you to explain to me what I mean. I'm talking over my head here.)
> Avoiding mmap is good since it prevents from calling munmap (a function we
> need to hack to prevent data corruption).
>
> > *) (Related to the previous bullet), MPI performance comparisons may reflect
> > these effects. Specifically, in comparing performance of OMPI, Intel MPI,
> > Scali/Platform MPI, and MVAPICH2, some tests (such as HPCC and SPECmpi) have
> > shown large performance differences between the various MPIs when, it seems,
> > none were actually spending much time in MPI. Rather, some MPI
> > implementations were turning off large-malloc mmaps and getting good
> > performance (and sadly OMPI looked bad in comparison).
> I don't think this bullet is related to the previous one. The first one is
> a good reason, this one is typically the Bad reason. Bad, but
> unfortunately true : competitors' MPI libraries are faster because ...
> they do much more than MPI (accelerate malloc being the main difference).
> Which I think is Bad, because all these settings should be let in
> developper's hands. You'll always find an application where these settings
> will waste memory and prevent an application from running.
>
> > *) These settings seem to be desirable for HPC codes since they don't do
> > much allocation/deallocation and they do tend to have loop nests that wade
> > through multiple large arrays at once. For best "out of the box"
> > performance, a software stack should turn these settings on for HPC. Codes
> > don't typically identify themselves as "HPC", but some indicators include
> > Fortran, OpenMP, and MPI.
> In practice, I agree. Most HPC codes benefit from it. But I also ran into
> codes where the memory waste was a problem.
>
> > I don't know the full scope of the problem, but I've run into this with at
> > least HPCC STREAM (which shouldn't depend on MPI at all, but OMPI looks much
> > slower than Scali/Platform on some tests) and SPECmpi (primarily one or two
> > codes, though it depends also on problem size).
> I had also those codes in mind. That's also why I don't like those MPI
> "benchmarks", since they benchmark much more than MPI. They hence
> encourage MPI provider to incorporate into their libraries things that
> have (more or less) nothing to do with MPI.
>
> But again, yes, from the (basic) user point of view, library X seems
> faster than library Y. When there is nothing left to improve on MPI, start
> optimizing the rest .. maybe we should reimplement a faster libc inside
> MPI :-)
>
> Sylvain
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Jeff Squyres
jsquyres_at_[hidden]