Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Rich L. Graham (rlgraham_at_[hidden])
Date: 2005-08-12 22:39:43


Sound reasonable - I am for being able to turn on optional things
that will improve performance...

Thanks,
Rich

On Aug 12, 2005, at 9:14 PM, Brian Barrett wrote:

> On Aug 12, 2005, at 9:43 PM, Rich L. Graham wrote:
>
>> Sounds like I got off the call a bit too early ;-)
>> Can we choose to use standard platform libraries, or are
>> we pinning
>> ourselves into a corner ? I.e., is this optional ?
>
> Yes - the code is all built around trying to use the standard
> platform. And yes, everything is optional. In many cases (pretty
> much everywhere but single threaded Linux), the default will be to
> not do any memory manager tricks at all. Of course, not having any
> memory manager hooks lessens the performance of the BTLs since we
> have to do pin/rdma pipelining, but that's the price we have to pay.
>
>> What sort of problems are we getting into playing with pre-load
>> options ? I would
>> be VERY careful here, and do plenty of testing, especially with c++
>> codes, before
>> you decide to do this. We used to use some of these tricks in LA-
>> MPI, but backed
>> off because of loader ordering issues.
>
> Agreed - I'm one of the ones who was very against doing it in the
> first place :). Currently, the default on everywhere but single
> threaded Linux is to not have any memory manager hooks at all. On
> single threaded Linux, we use the hooks provided by glibc for doing
> "something" before the actual free/realloc occurs. Because these are
> official, recommended ways of doing things, they should work on any
> C, C++, and Fortran codes, even if they are statically linked. I've
> tested them with C++ apps, and they work as the documentation implies
> they would.
>
> I don't think that the ldpreload tricks should ever be the default.
> I'd like to provide them, because on threaded builds (where the glibc
> hooks aren't available), they provide a much better solution than
> using ptmalloc2. The sysadmin/user would have to setup his
> environment to load the preload library. If the module fails to
> preload, there is a facility in place for the memory code to tell the
> mpools that there is no memory manager interrupt and to fall back to
> the unpin after use mode. Further, the ldpreload module (not yet
> committed, but half written) can run just fine even if the app
> started isn't an opal code (with little if any performance
> difference). I don't envision us ever explicitly setting the
> LD_PRELOAD in the pls components or anything like that. Instead, I
> see us documenting "Add this to your LD_PRELOAD or /etc/ld.preload
> and OMPI goes faster".
>
>> As you can tell, I am VERY leery of these sort of tricks for a
>> production grade
>> bit of code. If it is easy to decide at run-time if to use these
>> tricks (w/o a performance
>> penalty), this is a different question.
>
> Some of these will be very difficult to turn off at runtime (the
> LD_PRELOAD probably being the exception - you can at least turn that
> off any time before the application starts running). However, I
> don't think this is a problem because the defaults are going to be so
> pessimistic that we shouldn't get in a situation where the user is
> going to have to turn them off. I'm thinking big, annoying warnings
> in the installation document about turning the less-safe ones on.
>
> Brian
>
>
>> Begin forwarded message:
>>
>>
>>> From: Brian Barrett <brbarret_at_[hidden]>
>>> Date: August 12, 2005 7:47:45 PM MDT
>>> To: Open MPI Developers <devel_at_[hidden]>
>>> Subject: [O-MPI devel] Memory manager changes
>>> Reply-To: Open MPI Developers <devel_at_[hidden]>
>>>
>>> Hi all -
>>>
>>> For those not on the telecon Tuesday, we finally broke down and
>>> decided we needed to do all the system nastiness to intercept free()
>>> and munmap() and the like for high speed interconnects so that we can
>>> do pinned page caching and not take the pinning performance hit on
>>> applications like NetPIPE (and, to be fair, many user applications).
>>> Unlike LAM, however, we're going to try to make this not be the
>>> center of all pain and suffering ;). While we'll support the
>>> ptmalloc2 trick that LAM and MPICH-gm use, it will not be on by
>>> default and we're trying to find better alternatives. Below are your
>>> current choices for intercepting memory releases back to the
>>> operating system. The default is malloc_hooks on platforms that
>>> support it when threads aren't enabled. Otherwise the current
>>> default is "none".
>>>
>>> In all cases, in addition to dealing with free() and realloc(), we
>>> provide intercepts for munmap() to catch the user doing his own
>>> memory management. We may also want to intercept SysV shared memory
>>> functions.
>>>
>>> You can choose exactly which "memory manager" to use with the --with-
>>> memory-manager=TYPE option to configure, where TYPE is one of
>>> "ptmalloc2", "malloc_hooks", "darwin7", or "ldpreload". Of course,
>>> you can also use --without-memory-manager or --with-memory-
>>> manager=none to completely disable the things.
>>>
>>> * PTMALLOC2
>>>
>>> + Very fast implementation of the full malloc/free suite.
>>> Directly used by glibc as their memory manager.
>>> + Works properly in threaded environment
>>> + Only call unpin callbacks when giving memory back to the
>>> OS (ie, when sbrk() or munmap() are called)
>>> - Does not work properly in some situations (abacus linker
>>> tricks, for example) that appear to be within the
>>> spirit of using the MPI library
>>> - Does not work on many platforms (everywhere but linux, really)
>>> - Feels massively icky
>>>
>>> * MALLOC_HOOKS
>>>
>>> + Use the hooks proviced by ptmalloc2 (and therefore glibc)
>>> to get callbacks when free(), realloc(), etc are called
>>> + No "corner cases" that cause unexpected behavior like with
>>> ptmalloc2
>>> - Does not support threads (disables itself if either
>>> progress or mpi threads are enabled)
>>> - Have to call unpin callbacks when memory is free()d or
>>> realloc()ed, not when giving back to OS
>>> - Very low performance impact (1-2%) on calling free() when
>>> there are no mpools registering callbacks
>>>
>>> * LDPRELOAD
>>>
>>> + Thread safe
>>> + No "corner cases" that cause unexpected behavior like with
>>> ptmalloc2
>>> + Should work on every platform that supports LD Preload and
>>> dlsym()
>>> - Requires doing ldpreload tricks
>>> - On some platforms, have to call unpin callbacks when
>>> memory is free()d or realloc()ed, not when giving back
>>> to the OS
>>> - Did I mention, it requires doing ldpreload?
>>> + If LDPRELOAD doesn't succeed, opal can properly determine
>>> this and will just say free() interception is unavailable
>>>
>>> * DARWIN7
>>>
>>> + Thread safe
>>> - Requires some nasty linker tricks to make work. User
>>> application must be linked with mpicc or a long list
>>> of special flags
>>> + If application is not linked with the special sauce,
>>> opal should be able to properly determine this and just
>>> say free() interception is unavailable.
>>> - Total hack of linker tricks
>>>
>>> LD Preload is not yet implemented, but should be by the end of the
>>> weekend. The initial version will most likely only support making
>>> callbacks every time free() / realloc() is called, rather than every
>>> time memory is given back to the OS. Not optimal, but better than
>>> nothing.
>>>
>>> I'm going to talk with some Darwin developers about better ways to do
>>> things on Darwin, but probably won't have any results on that front
>>> until sometime middle of next week.
>>>
>>>
>>> Brian
>>>
>>> --
>>> Brian Barrett
>>> Open MPI developer
>>> http://www.open-mpi.org/
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
> Brian Barrett
> Open MPI developer
> http://www.open-mpi.org/
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel