Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] memory binding
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-12-13 17:06:41


On Dec 13, 2010, at 4:22 PM, David Singleton wrote:

> I didnt see memory binding in their explicitly.

You're correct; sorry, I was just referring to some general slides that showed some of the ideas that we're working on for next-generation affinity stuff. But memory binding will be included as well.

>> What OS and libnuma version are you running? It has been my experience that libnuma can lie on RHEL 5 and earlier. My (possibly flawed) understanding is that this is because of lack of proper kernel support; such "proper" kernel support was only added fairly recently (2.6.30something).
>
> That's interesting. By "lie", do you mean processes are not really memory bound?

I mean that even when usinga strict memory binding policy, if you numa_alloc* on node X, you can get memory on node Y.

> We're running 2.6.27.55 (and numactl 0.9.8-11.el5) and I've done quite a bit of
> testing that always looks correct.

That could well be.

On RHEL 5 (2.6.18 and numactl-0.9.8), the above "bad" behavior happens. With RHEL 6 (2.6.32 and numactl-2.0.3), it seems to be correct. Where exactly the issue was fixed, I'm not entirely sure.

>> That aside, it's somewhat disappointing that MPOL_PREFERRED is not working well and that you had to switch to MPOL_BIND. :-(
>
> I'm not sure its disappointing - I think it's just to be expected. For sites that
> drop caches or run a whole node memhog or reboot nodes between jobs, MPOL_PREFERRED
> will do the right thing. For sites that are not so careful or use suspend/resume
> scheduling, memory overcommits and some amount of page reclaim or paging on job
> startup will happen occasionally. Paying the extra cost of making sure that page
> reclaim or paging results in ideal locality is definitely a big win for a job
> overall. (Paging suspended jobs back in after they are resumed can undo some of
> their ideal placement but that can be handled.)

Fair enough.

>> Should we add an MCA parameter to switch between BIND and PREFERRED, and perhaps default to BIND?
>
> I'm not sure BIND should be the default for everyone - memory imbalanced jobs might
> page badly in this case. But, yes, we would like an MCA to choose and allow sites
> to select BIND as their default if they wish. An mpirun option like --bind-to-mem
> would need a preferred/affinity alternative and I'm not sure how of a nice notation/
> syntax for that.

How about:

  --mca maffinity_libnuma_policy bind|preferred

I can do that for the v1.5 series, if you'd like. I can't really do it for v1.4 because that series is in "bug fix only" mode. However, given that we're revamping all of our affinity support, I don't know what the future interface will look like -- so the name may change, or ...

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/