Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] questions about memory binding flags
From: Samuel Thibault (samuel.thibault_at_[hidden])
Date: 2011-01-05 05:20:16

Jeff Squyres, le Tue 04 Jan 2011 21:57:56 +0100, a écrit :
> Is it correct to assume that any hwloc_membind_flags_t flags can be or'ed together except _THREAD and _PROCESS?

Yes, they really are flags (except _THREAD and _PROCESS which are
exclusive of course).

> By their values, it looks like policy flags cannot be OR'ed.


> Here's all the policy flags:
> -----
> HWLOC_MEMBIND_DEFAULT = 0, /**< \brief Reset the memory allocation policy to the system default.
> * \hideinitializer */
> HWLOC_MEMBIND_FIRSTTOUCH = 1, /**< \brief Allocate memory on the given nodes, but preferably on the
> node where the first accessor is running.
> * \hideinitializer */
> -----
> I'm not quite sure what "where the first accessor is running" means. Does this mean that the intent is that the memory will be bound to the numa node local to the first thread that touches the memory?

Err, yes. Feel free to rephrase to anything that would be clearer.

> If so, does this happen on a page-by-page basis, or as a whole allocation?


> -----
> HWLOC_MEMBIND_BIND = 2, /**< \brief Allocate memory on the given nodes.
> * \hideinitializer */
> HWLOC_MEMBIND_INTERLEAVE = 3, /**< \brief Allocate memory on the given nodes in a round-robin manner.
> * \hideinitializer */
> -----
> What is the unit of distribution -- is it by page?

Mmm, OS documentations don't specify it, they usually only talk
about "round-robin allocation", "interleaved allocation", "stripped
allocation", or simply "accessed by many processors, thus distribute the

> If so, is there a way to find out which way it bound?

We can try to benchmark memory accesses, but I don't think we should
want to be too specific, because that'd mean adding yet more policies to
choose and try for the programmer. We can however explain that it's
useful when a given range of memory is accessed by many processors, and
the memory access load should thus be distributed across nodes.

> -----
> HWLOC_MEMBIND_REPLICATE = 4, /**< \brief Replicate memory on the given nodes.
> * \hideinitializer */
> -----
> Does this mean that if I allocate 10 pages worth of memory with 2 nodes specified, I'm actually allocating 2x that amount and duplicating it on both nodes?


> I.e., is the memory bound like this:
> node A: 0, 1, 2, ..., 9
> node B: 0, 1, 2, ..., 9
> and that a write to page 0 will physically write to *both* pages?

Actually, it's usually only supported for read-only data.

> What happens with reads? Does the data come from the first node that was specified, and therefore the cost of a read is the cost of getting the data from the first node that was specified?

Each thread accesses to its local NUMA node, that's precisely the point
of replicating the data :)

> More specifically, what's the point of REPLICATE? Is it solely for memory hardware fault tolerance (e.g., intel RAS)?

Not at all, it's really for performance reason.

> What happens if the hardware/OS isn't capable of doing REPLICATE? Will some kind of error be returned?

ENOSYS, as usual (and there is also the support flag for it in the
topology structure). Actually, at the moment only OSF supports it.

> -----
> HWLOC_MEMBIND_NEXTTOUCH = 5 /**< \brief On next touch of existing allocated memory, migrate it to the node
> * where the memory reference happened.
> * \hideinitializer */
> -----
> What happens if the memory was not previously bound?

It gets bound.

> Same questions as above with FIRSTTOUCH -- is this on a page-by-page basis, or as an entire allocation?


Thanks for your review, it's really useful to make sure that things
which are obvious to me since I've written the code are properly
documented :)