Sounds good to me. Thanks Brian!
On Jun 3, 2008, at 12:04 PM, Brian W. Barrett wrote:
> Hi all -
> Sorry this is so late, but it took a couple of iterations with a
> couple of
> people to get right from a technology standpoint. All mistakes in
> proposal are my fault.
> What: Fix the ptmalloc2 problem
> How: Remove it from the default path
> When: This weekend? For the 1.3 branch
> The problem: On Linux today, we by default build a copy of ptmalloc2
> libopen-pal.so so that RDMA networks can get better bandwidth using
> leave_pinned. Normally users don't use or need leave_pinned, but we
> to have it available for benchmarks and the few apps that gain by
> the more independent-ish progress. However, by having it there, we're
> screwing with the memory manager, which has a number of bad side
> First, it can cause numerous crashes if the user is providing his/
> her own
> allocator. Second, there is growing evidence that the ptmalloc2 in
> MPI has an evil corner case we can't pinn down that causes explosive
> growth in memory utilization.
> The proposal: Remove ptmalloc2 from libopen-pal.so and make it a
> standalone library (for forward compatibility, currently called
> libompi-malloc.so), which the user can explicitly link in. This will
> allow users to turn ptmalloc2 support on/off at application link time
> instead of MPI compile time. Given the limited number of leave_pinned
> users, this seems to be a good compromise for the near-term between
> greater stability for most users and fast performance for power users.
> The mallopt() solution, which means free() never gives memory back
> to the
> OS (but does reuse it), which works well for benchmarks, will still be
> available at all times.
> The work: Some autoconf magic to move most (but not all -- in
> the munmap() support) of the ptmalloc2 component into its own library.
> This is extremely low risk, and actually lowers the risk of Open MPI
> breaking by removing code from the critical path. There will also
> be a
> small number of enhancements to the mpool base code to better detect
> situations where leave_pinned is used by we can't sense giving
> memory back
> to the OS.
> I'd like this for 1.3, as we're running into more and more situations
> where this code isn't working. Also, the lone supporter of the
> code (me) doesn't want to do it anymore and removing the code from the
> critical path will lower the workload of me (ie, the retired guy who's
> doing this for fun).
> If you have objections, please let me know before Friday. I'd like to
> commit these changes to the trunk this weekend.
> devel mailing list