On Dec 10, 2010, at 4:56 PM, David Singleton wrote:
> Is there any plan to support NUMA memory binding for tasks?
For some details on what we're planning for affinity, see the BOF slides that I presented at SC'10 on the OMPI web site (under "publications").
> Even with bind-to-core and memory affinity in 1.4.3 we were seeing 15-20%
> variation in run times on a Nehalem cluster. This turned out to be mostly due
> to bad page placement. Residual pagecache pages from the last job on a node (or
> the memory of a suspended job in the case of preemption) could occasionally cause
> a lot of non-local page placement. We hacked the libnuma module to MPOL_BIND
> tasks to their local memory and eliminated the majority of this variability.
> We are currently running with this as default behaviour since its "the right
> thing" for 99% of jobs (we have an environment variable to back off to affinity
> for the rest).
What OS and libnuma version are you running? It has been my experience that libnuma can lie on RHEL 5 and earlier. My (possibly flawed) understanding is that this is because of lack of proper kernel support; such "proper" kernel support was only added fairly recently (2.6.30something).
That aside, it's somewhat disappointing that MPOL_PREFERRED is not working well and that you had to switch to MPOL_BIND. :-(
Should we add an MCA parameter to switch between BIND and PREFERRED, and perhaps default to BIND?
> I'm guessing/hoping doing the above based on hwloc will be easier/more
> maintainable. As a first pass, when is that likely to be an option?
The first pass of hwloc support will *only* be replacing the paffinity modules. Memory support using hwloc is definitely planned, but if there are kernel issues, hwloc won't be any better than libnuma.
For corporate legal information go to: