Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] Notes from mem hooks call today
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-05-28 16:46:46

brian: 1st point: propose remove opal/mca/memory/darwin (memory hooks
   on OS X). Rationale:
   - mvapi support is gone
   - gm would be only user
   - no one is supporting the code anymore (it ain't broke, but...)
   --> patrick says: no problem. only myri osx customers have a special
       mpich-mx, so it's ok.
   --> jeff will svn rm mca/memory/darin

discussion about current state of ptmalloc2
- only really useful for benchmarks (i.e., --mca mpi_leave_pinned 1)
- why have it in the way for apps that don't use mpi_leave_pinned?
- it gets in the way of MX (we "sorta" get away with it)
- also, we can't use ptmalloc2 for sun -- would be nice to do
   something that they can use
- also remember that we hacked our copy of ptmalloc2 to make it work
   nicely (e.g., because OF deregister calls malloc/free)
   - note that our ptmalloc2 hacks are basically equivalent to mallopt:
     we rarely return memory to the OS (e.g., very large allocations,
     when ptmalloc uses its munmap case)
   --> brian will double check this point

4 proposals:

1. patrick proposes to use the MMU notifiers -- likely to be in linux
   - network driver will need to implement reg cache functionality
   - these MMU notifiers will not be visible to OMPI; OMPI simply
     *always* registers (a system call) and the driver implements the
     cache and does the de-register for you when the memory is freed
   - gleb asks: don't we want to avoid the system call when possible?
   - patrick: a single syscall can be/is cheaper than a reg cache
     lookup in user space

2. patrick also proposes dlmalloc
   - not as efficient as ptmalloc2 (no fine-grained thread locks)
   - but is more robust and simpler than ptmalloc2 (mpich-mx switched
     to it long ago)
   - has the same linker issues as ptmalloc2 (e.g., will be problematic
     with apps that require their own allocator)
   --> better for longer term (e.g., OMPI v1.4) because dlmalloc
       handles large numbers of short malloc/free's better than
   --> upgrading to dlmalloc is also subject to points at bottom of
       these notes (don't call free() during de-register code paths)

3. brian proposes mallopt
   - patrick says: you have to check if registering memory is on the
     stack. what do we do now?
   - neither brian nor galen remembers offhand; we'll need to check
   - we will have problems with apps that do lots of small allocations,
     but still better than ptmalloc2 because can turn off mallopt via
     MCA param (i.e., just tell users: "don't use mpi_leave_pinned")
     instead of recompiling/reinstalling OMPI to disable ptmalloc2

4. patrick also mentions: can simply use pipeline (take the bw perf
   hit). Unfortunalely, not feasible for benchmarks. :-(


For v1.3, gravitating towards the following: leave ptmalloc2 as
   in the v1.3 tarball, but don't build it unless explicitly requested,
   and ensure that the mallopt() protocol stuff works.

   - note that the mallopt code is currently enabled by 2nd mca param
   - patrick: no guarantee that malloc will comply; it's only a hint.
     need to have a run-time test to ensure that it works: set the trim
     threshhold to large. then malloc something just over the
     threshhold and free it, and see if munmap hooks were called.
   - brian: we'll need to add the hooks for munmap (probably move them
     from where they are currently located)
   - patrick: what about case like CHARMM where they have their own
     allocator and don't support mallopt() hints?
   - brian: same as today -- if you provide your own allocator,
     leave_pinned doesn't work. benefit here is that if you're *not*
     using leave_pinned, then don't have heavyweight ptmalloc2 in the
     way. but you are hosed if you try to have your own allocator with

*** brian's proposal for v1.3:
   - disable building ptmalloc2 unless specifically requested
   - add a component for intercepting munmap
   - enable mallopt by default (currently in the mpool base) if all of
     the following is true:
       - you are using the munmap-intercept component (we can check
         this at run-time)
       - leave_pinned was requested
       - mallopt hints work


- gleb: random note: if you call free from a callback in a threaded
   build, we can deadlock
   - brian: because OpenFabrics unregister calls malloc/free, and this
     causes problems. we added a hack-ish loop to try to handle this.
     probably not completely corect; don't really know what *to* do.
   - gleb: solved in openib btl -- we simply don't unreg on callback
     (just save it on a list to unregister later). but there are other
     places it can/does happen.
   - brian: yes, it's likely to be a big problem to cleanup. unlikely
     to happen for v1.3.

Jeff Squyres
Cisco Systems