Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] memory binding
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-12-13 09:29:42


On Dec 10, 2010, at 4:56 PM, David Singleton wrote:

> Is there any plan to support NUMA memory binding for tasks?

Yes.

For some details on what we're planning for affinity, see the BOF slides that I presented at SC'10 on the OMPI web site (under "publications").

> Even with bind-to-core and memory affinity in 1.4.3 we were seeing 15-20%
> variation in run times on a Nehalem cluster. This turned out to be mostly due
> to bad page placement. Residual pagecache pages from the last job on a node (or
> the memory of a suspended job in the case of preemption) could occasionally cause
> a lot of non-local page placement. We hacked the libnuma module to MPOL_BIND
> tasks to their local memory and eliminated the majority of this variability.
> We are currently running with this as default behaviour since its "the right
> thing" for 99% of jobs (we have an environment variable to back off to affinity
> for the rest).

What OS and libnuma version are you running? It has been my experience that libnuma can lie on RHEL 5 and earlier. My (possibly flawed) understanding is that this is because of lack of proper kernel support; such "proper" kernel support was only added fairly recently (2.6.30something).

That aside, it's somewhat disappointing that MPOL_PREFERRED is not working well and that you had to switch to MPOL_BIND. :-(

Should we add an MCA parameter to switch between BIND and PREFERRED, and perhaps default to BIND?

> I'm guessing/hoping doing the above based on hwloc will be easier/more
> maintainable. As a first pass, when is that likely to be an option?

The first pass of hwloc support will *only* be replacing the paffinity modules. Memory support using hwloc is definitely planned, but if there are kernel issues, hwloc won't be any better than libnuma.

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/