Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] memory size attributes
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2010-01-16 08:15:05


Just so I understand - are you saying hwloc should track both the total amount of memory *and* the makeup of that amount, broken up by page size? So obj A may have x total memory, split across y 4k pages and z bigk hugepages (for example)? And then the question becomes how to store this variable-page-sze information, right?

I'd say it can be valuable to support key=value pairs on any object so that future object types can be extensible (e.g., vendor pci devices). But common stuff should be accessible as struct members so that there's no string parsing needed (I'm no doubt just voicing what we all already think). I.e., esoteric stuff can start as a key=value strings but as they get mature / popular, they can "graduate" to become a struct member.

As for how to store page counts as a function of page size, since we may not want to hard-code page sizes into fields, and I would prefer that they are not strings, how about an array of int[2]'s (page size and count)? Or an array of structs (with fields of page size and count)?

-jms
Sent from my PDA. No type good.

----- Original Message -----
From: hwloc-devel-bounces_at_[hidden] <hwloc-devel-bounces_at_[hidden]>
To: Hardware locality development list <hwloc-devel_at_[hidden]>
Sent: Sat Jan 16 07:08:46 2010
Subject: Re: [hwloc-devel] memory size attributes

Brice Goglin wrote:
> Hello,
>
> While cleaning the System/Machine root types, I wondered what we
> actually want to store in memory_kB attributes. It looks obvious for
> Caches and NUMA nodes. But I am not sure about Machines and Systems.
>
> If we have a machine with 2 NUMA nodes, should the machine memory size
> be the sum the sizes of both NUMA node sizes? Or should it be 0 since
> the machine has no memory except in NUMA nodes? Same question for a
> Kerrighed system assembling 2 machines.
>
> Then, if we have 1 Misc object grouping some NUMA nodes that are close
> to each other: Should we store the total memory size of these nodes in
> the Misc object attribute as well? We have the total memory size in the
> NUMA node object (below misc) and in the machine object (above misc),
> why not in Misc too? I am not saying that I want it, I am saying that
> it's not very consistent.
>
> So I wonder if we should just not sum anymore and let the application do
> the math when it actually needs the sum. A quick helper with
> get_next_obj_by_type( ... NODE) would work.
>
> Or we need to document memory size attributes better:
> * NUMA node: set of memory that can be accessed with the same access
> time from other objects
> * machine: set of cache-coherent memory, can be made of multiple NUMA nodes
> * system: set of memory that is virtually accessible, but may not be
> cache-coherent
>

Aside from the memory_kB attribute, I wonder what should be done with
hugepages. I don't think we need to accumulate these at the system level
since multiple machines could well have different hugepage sizes.

And even inside a single machine, it's been pointed out that
architectures support multiple hugepage sizes. So we might have to
support several of them at the same time in the future. It could stored
in the attributes as an array of (hugepage size, hugepage number) in
numa node attributes but I don't really like that.

One way to support future random attributes could be to have an array of
stringified attributes, like foo=bar, dmiboardinfo=bar, ... and
hugepage(2M)=1024. Applications would have to parse them, but it's much
easier for us in the end.

And if we go this way, aside from stringified hugepage stuff, memory
attributes of node/machine/system would only be the unsigned long
memory_kB field. So we could even put memory_kB back into the main
hwloc_obj structure. Only cache would still have specific attributes
(its depth and maybe data/instruction/unified flag).

Brice

_______________________________________________
hwloc-devel mailing list
hwloc-devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel