Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] memory binding
From: David Singleton (David.Singleton_at_[hidden])
Date: 2010-12-10 16:56:33

Is there any plan to support NUMA memory binding for tasks?

Even with bind-to-core and memory affinity in 1.4.3 we were seeing 15-20%
variation in run times on a Nehalem cluster. This turned out to be mostly due
to bad page placement. Residual pagecache pages from the last job on a node (or
the memory of a suspended job in the case of preemption) could occasionally cause
a lot of non-local page placement. We hacked the libnuma module to MPOL_BIND
tasks to their local memory and eliminated the majority of this variability.
We are currently running with this as default behaviour since its "the right
thing" for 99% of jobs (we have an environment variable to back off to affinity
for the rest).

I'm guessing/hoping doing the above based on hwloc will be easier/more
maintainable. As a first pass, when is that likely to be an option?