Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] hwloc-bind syntax
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-12-03 16:03:00


On Dec 3, 2009, at 12:26 PM, Brice Goglin wrote:

> > (shouldn't that say hwloc-bind, not topobind?)
>
> Right :)

Easily fixed -- just done. :-)

> > That would seem useful (slightly shorter than "proc:0.proc:1.proc:4"). I can file a feature request if it's not already supported.
>
> Actually, it would proc:0 proc:1 proc:4 (space separated).
> hwloc-bind/mask do a logical/cpuset OR of all objects/masks given on the
> command-line.

Ah -- I see from your explanation below that foo.bar.baz is different than foo bar baz.

I haven't looked at the argv parsing -- does it just strcmp each of the argv's and look for a recognized prefix, and if so, assume that it is a specification? If it doesn't find a recognized prefix, it assumes that it's the first argv of the tokens to exec (and therefore stop examining argv)? FWIW, this is pretty much what mpirun does.

Is "--" recognized, too?

(I'm now asking for more detail because I intend to document this stuff properly ;-) )

> > 2. What does it mean to "hwloc-bind core:0 ..."? (I asked Samuel this in IM as well, but I didn't understand his answer). *Which* "core 0" does that refer to? For example, an abbreviated version of my lstopo output is as follows (it's a pre-production EX machine -- I can't share all the details -- I 'x'ed out some of the numerical values):
> >
> > -----
> > System(xxxGB)
> > Node#0(xxxGB) + Socket#0 + L3(xxxMB)
> > L2(xxxKB) + L1(xxxKB) + Core#0 + P#0
> > ...
> > Node#1(xxxGB) + Socket#2 + L3(xxxMB)
> > L2(xxxKB) + L1(xxxKB) + Core#0 + P#1
> > ...
> > -----
> >
> > The processors have unique numbers, but the cores do not. Is that a bug?
>
> These are physical/OS indexes, not logical indexes.
>
> hwloc-bind/mask takes logical indexes, no it has nothing to do with the
> above #N. core:1 means "the second Core object" when you the above
> output from top to bottom.

Hmm. That's very confusing.

FWIW: we went round and round (and round and round and round and ...) in deciding whether to use physical/OS indexing or logical indexing in Open MPI. We finally decided that users only care about logical indexing -- we hid all physical/OS indexing values under the covers.

Hwloc, obviously, is a bit different. More below.

> > 3. What is the difference between "system" and "machine"?
>
> Machine is a physical machine. System may be be different in case of
> Single System Image like Kerrighed, vSMP, ... (only Kerrighed is
> supported so far).

Do we have good descriptions for each of the scope names that can be put in the docs? hwloc-mask shows the following names:

system, machine, node, socket, core, proc[essor]

Has anyone contacted Penguin and/or XHPC (and/or any other SSI projects) to see if they care about being supported by hwloc?

--> This is a good point to support my dynamic SSO plugin idea. ;-)

> > 4. What exactly does "index" refer to -- is it a virtual index (e.g., hwloc's numbering of 0-N) or is it the OS's index? I thought we used OS index numbering, but #2 confuses me -- if #2 is just a bug, then perhaps this question is moot. :-)
>
> We use virtual/logical/OS index everywhere, except in the lstopo output
> and in the functions that contain os_index in their prototype.

Hmm - I can't parse that. You seem to be equating logical == virtual == OS indexing in that statement, but you distinctly called OS and logical indexing different in text higher up in this reply...

Regardless, I find this confusing -- I'm quite sure that newbies will also find it confusing. All of hwloc should default to one form of indexing (regardless of whether it's physical/OS or some form of logical/hwloc-imposed indexing) -- and/or be explicit about which kind of indexing is used in every case.

To be clear: it's strange to me that you can't use the numbers in the output from lstopo as arguments to hwloc-bind. I think that this will be quite a common / useful usage pattern: look up your machine's topology with lstopo and then hwloc-bind a command to something that you see in the lstopo output.

At a minimum, I would think that all the CLI commands should default to the same kind of indexing to prevent confusion.

Perhaps hwloc CLI tools should be able to show/accept *both* kinds of indexing...? E.g.:

  lstopo --physical
  lstopo --logical

  hwloc-bind --physical ...
  hwloc-bind --logical ...

> > 5. What exactly is a "cpuset string"? Can some examples be provided?
>
> It's 0 for nothing, ffffffff for 32procs, 1,,,,,,,,1 for the the first
> and the 257th processors. It's a comma separated list of 32bits bitmak.

Ah, ok. To be clear, is it accurate to say that it is one of the following forms:

- a hex number (without leading "0x" -- would "0x" be ignored if it is supplied?)
- a comma-delimited set of 32bit bitmasks where MSB 0's do not have to be listed

> > --> Sidenote: I actually find hwloc's use of the word "cpuset" to be quite confusing because it is *NOT* the same as an OS cpuset.
>
> The structure might be a bit different, but it is conceptually the same
> than the OS cpuset. When bit N is set in a hwloc cpuset, it means we are
> talking about the processor whose *OS-index* is N.

I guess what I find confusing is that Linux's concept of a "cpuset" is a binding term (e.g., it's the set of cpu's assigned to a process and you can't break out of that set). The hwloc docs glossary says:

----
CPU set The set of logical processors logically included in an object (if any). This term does *not* have any relation to an operating system “CPU set.”
-----
So we're specifically stating in the docs that they're different.  And it seems like they *are* different -- yes, they're both "sets of CPUs", but at least the Linux definition of "cpuset" has additional connotations / meaning (I don't know if other OS's define the term "cpuset").
> > 6. "several <depth:index> may be concatenated with `.'..."  Does that mean that this is legal:
> >
> >     core:0.node:2.system:4
> >
> > If so, what exactly does it mean when they overlap?  Is it simply the union of those 3 specifications?
> 
> It means 5th logical system below 3rd logical node below first core. So
> it means nothing when there are no node objects below cores or no
> systems below nodes.
Ahh... now I see.  So it's meant to be a logical delve into the topology -- the leftmost item is meant to be the highest item in the topology, and each "." item must be a child of the item to its left.
Is that correct?
Does it always need to start with system?  If not, can you provide an example that you have to represent with the . notation and could not be represented with non-. notation?
-- 
Jeff Squyres
jsquyres_at_[hidden]