Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] Cacheline sizes
From: Wheeler, Kyle Bruce (kbwheel_at_[hidden])
Date: 2010-05-25 15:46:09

On May 25, 2010, at 1:00 PM, Brice Goglin wrote:

> Numerous ideas like this were proposed and we're not sure where to stop.
> If we start doing this, people will ask for the processor frequency, the
> number of floating point units per core, the associativity of the cache,
> the type of memory, ... lots of things that are not really related to
> topology but may be useful to some applications. Cache line size isn't
> that bad, but it's borderline, so I don't know if we want it. There are
> many other specific tools to gather such random hardware information,
> merging all of them inside hwloc wouldn't be good.

I can certainly understand that; perhaps a good way of knowing where to draw the line is to clearly define the goals of hwloc and the expected environment. For example, you could say that the purpose of hwloc is *exclusively* to present topology information and allow programs to locate themselves within that topology. With that kind of a limit, though, hwloc already presents too much information---what does cache size have to do with topology? Perhaps it is that detail that appears to open the door to other information. Unless hwloc is targeting heterogenous environments where you might want to bind your process to a different CPU based on the cache size, that information *seems* superfluous. And that starts down the slippery slope: what hierarchy/object-specific data is sufficiently important to add based on the idea that someone might use that information to inform affinity decisions?

I agree that, inherently, cache line size has nothing to do with topology. But on the other hand, it's particularly useful for parallel shared-memory applications (to avoid false-sharing), which are precisely the sort of applications that would be most interested in using hwloc (especially if we're considering a heterogeneous environment where each cache might have a different cache-line size). Obviously, it's easy to just keep going and mine all kinds of random information about hardware, but I would argue that things like floating point unit count or cache associativity has even less to do with topology and are not generally interesting for parallel shared-memory applications.

Really, it would seem that hwloc *really* needs a good definition of its scope and/or audience.

> Talking about caches, one thing we need to think about is Instruction
> caches (we only gather Data and Unified caches on Linux so far).

Why is runtime icache information important? :)

Kyle Wheeler, PhD