Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: [hwloc-devel] questions about memory binding flags
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-01-04 15:57:17


Is it correct to assume that any hwloc_membind_flags_t flags can be or'ed together except _THREAD and _PROCESS?

By their values, it looks like policy flags cannot be OR'ed. This is probably worth mentioning in the docs (I can do so, but won't commit until the rest of these questions are answered).

Here's all the policy flags:

-----
  HWLOC_MEMBIND_DEFAULT = 0, /**< \brief Reset the memory allocation policy to the system default.
                                         * \hideinitializer */
  HWLOC_MEMBIND_FIRSTTOUCH = 1, /**< \brief Allocate memory on the given nodes, but preferably on the
                                          node where the first accessor is running.
                                         * \hideinitializer */
-----

I'm not quite sure what "where the first accessor is running" means. Does this mean that the intent is that the memory will be bound to the numa node local to the first thread that touches the memory?

If so, does this happen on a page-by-page basis, or as a whole allocation? Consider this example (assume no race conditions):

 1. allocate 2 pages with the FIRSTTOUCH policy
 2. thread A on node X only touches page 0
 3. later, thread B on node Y touches page 1

Where are pages 0 and 1 bound? Are they bound to X and Y, respectively, or are both bound to X?

...or is the answer OS/system specific? If so, is there a way to find out which way it bound?

-----
  HWLOC_MEMBIND_BIND = 2, /**< \brief Allocate memory on the given nodes.
                                         * \hideinitializer */
  HWLOC_MEMBIND_INTERLEAVE = 3, /**< \brief Allocate memory on the given nodes in a round-robin manner.
                                         * \hideinitializer */
-----

What is the unit of distribution -- is it by page? E.g., if I specify 4 numa nodes and allocate 10 pages, are they bound like this:

node A: 0, 4, 8
node B: 1, 5, 9
node C: 2, 6
node D: 3, 7

Or does it (more-or-less) equally distribute the pages across the 3 nodes, like this:

node A: 0, 1, 2
node B: 3, 4, 5
node C: 6, 7
node D: 8, 9

...or is the answer OS/system specific? If so, is there a way to find out which way it bound?

-----
  HWLOC_MEMBIND_REPLICATE = 4, /**< \brief Replicate memory on the given nodes.
                                         * \hideinitializer */
-----

Does this mean that if I allocate 10 pages worth of memory with 2 nodes specified, I'm actually allocating 2x that amount and duplicating it on both nodes? I.e., is the memory bound like this:

node A: 0, 1, 2, ..., 9
node B: 0, 1, 2, ..., 9

and that a write to page 0 will physically write to *both* pages? If so, what's the cost of the write? Is it the time to write to all nodes, or the time to write to the first node that was specified?

What happens with reads? Does the data come from the first node that was specified, and therefore the cost of a read is the cost of getting the data from the first node that was specified?

More specifically, what's the point of REPLICATE? Is it solely for memory hardware fault tolerance (e.g., intel RAS)?

What happens if the hardware/OS isn't capable of doing REPLICATE? Will some kind of error be returned?

-----
  HWLOC_MEMBIND_NEXTTOUCH = 5 /**< \brief On next touch of existing allocated memory, migrate it to the node
                                         * where the memory reference happened.
                                         * \hideinitializer */
-----

What happens if the memory was not previously bound?

Same questions as above with FIRSTTOUCH -- is this on a page-by-page basis, or as an entire allocation? E.g., if I allocate/bind 10 pages, then later set it to NEXTTOUCH, and then touch the 4th page, will the entire memory be moved to the numa node that is local to the thread where the touch occurred, or just the 4th page?

Thanks!

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/