Jeff Squyres, le Tue 18 Jan 2011 20:00:42 +0100, a écrit :
> On Jan 12, 2011, at 10:10 AM, Samuel Thibault wrote:
> > This is not what I meant: hwloc_alloc_membind_policy's purpose is only
> > to allocate bound memory. It happens that hwloc_alloc_membind_policy
> > _may_ change the process policy in order to be able to bind memory
> > at all (when the underlying OS does not have a directed allocation
> > primitive), but that's not necessary. If hwloc can simply call a
> > directed allocation primitive, it will do it. If the OS doesn't support
> > binding at all, then hwloc will just allocate memory.
> How's this?
> * Setting this policy will cause the OS to try to bind a new memory
> * allocation to the specified set.
Err, no, again hwloc_alloc_membind_policy's purpose is _not_ to set a
policy for future allocations, but _only_ to allocate data. It just
_happens_ to possibly have to change the current process policy in order
to achieve the binding, but that's only a side effect. Think of it as
"allocate bound memory, possibly changing the policy just for that".
> As a side effect, some operating
> * systems may change the current memory binding policy;
It's not really the system that changes the current memory binding
policy, it's hwloc which explicitly requests the operating to do so, in
order to actually get the desired binding.
I have rephrased it.
> >> + HWLOC_MEMBIND_INTERLEAVE = 3, /**< \brief Allocate memory on
> > This is not really correct: if the threads were splitting the memory
> > amongst themselves, FIRSTTOUCH should be used instead, to migrate pages
> > close to where they are referenced from. I have rephrased that
> What's a good simple example scenario when it would be good to use INTERLEAVE, then?
Well, this is what I have put instead:
"Interleaving can be useful when threads distributed across the
specified NUMA nodes will all be accessing the whole memory range
concurrently, since the interleave will then balance the memory
By "the whole", I really mean _all_ the threads will access the _whole_
range, without known separation, e.g. a coefficient vector that all
threads need to read to perform some computation.