Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] Questions to lstopo and hwloc-bind
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-09-14 02:47:40


Le 14/09/2012 07:48, Siegmar Gross a écrit :
> I have installed hwloc-1.5 on our systems and get the following output
> when I run "lstopo" on a Sun Server M4000 (two quad-core processors with
> two hardware-threads each).
>
> rs0 fd1026 101 lstopo
> Machine (32GB) + NUMANode L#0 (P#1 32GB)
> Socket L#0
> Core L#0
> PU L#0 (P#0)
> PU L#1 (P#1)
> Core L#1
> PU L#2 (P#2)
> PU L#3 (P#3)
> Core L#2
> PU L#4 (P#4)
> PU L#5 (P#5)
> Core L#3
> PU L#6 (P#6)
> PU L#7 (P#7)
> Socket L#1
> Core L#4
> PU L#8 (P#8)
> PU L#9 (P#9)
> Core L#5
> PU L#10 (P#10)
> PU L#11 (P#11)
> Core L#6
> PU L#12 (P#12)
> PU L#13 (P#13)
> Core L#7
> PU L#14 (P#14)
> PU L#15 (P#15)
>
> When I run the command on a Sun Ultra 45 with two single core processors
> I get the following output.
>
> tyr fd1026 116 lstopo
> Machine (4096MB)
> NUMANode L#0 (P#2 2048MB) + Socket L#0 + Core L#0 + PU L#0 (P#0)
> NUMANode L#1 (P#1 2048MB) + Socket L#1 + Core L#1 + PU L#1 (P#1)
>
>
> First question: Why reports "lstopo" two NUMA nodes on a Sun Ultra and
> only one NUMA node on the M4000 although both machines are equipped
> with two processors and both machines are running Solaris 10?

Depending on the architecture, you may have one NUMA node containing
multiple processor sockets (old x86 machines for instance), one NUMA
node per socket (many modern processors), or even multiple NUMA nodes
per socket (some AMD processor). I am not familiar enough with Sparc
processors to compare, but I would bet that some exist in the first and
second model too.

Google has some links to a patch adding NUMA support for the Ultra 45 in
Opensolaris, so the second output would be OK.

And people say that the lgroup utility confirms that the M4000 is not
NUMA (which means the first output would be right).

> I get the following error when I try to bind a process to a core
> on the M4000 machine.
>
> rs0 fd1026 104 hwloc-bind socket:0.core:0 -l date
> hwloc_set_cpubind 0x00000003 failed (errno 18 Cross-device link)
> Fri Sep 14 07:37:14 CEST 2012
>
>
> I can use the following command which works for all 16 hardware threads.
>
> rs0 fd1026 105 hwloc-bind pu:0 -l date
> Fri Sep 14 07:38:37 CEST 2012

On Solaris, you can't bind to an entire core if it contains multiple
threads. You have to bind to a single thread (a PU). When each core
contains a single thread, you're lucky :)

> Second question: How can I find out which bindings are allowed when
> I know the output from "lstopo"? I have no idea why I get "errno 18
> Cross-device link" on the M4000.

That's something we need to think about. We were aware of the limitation
but we didn't really think about making the user aware of it yet. We
have a function that returns some information about what hwloc supports
on the current platform. It could be extended. But if we want to be
feature complete, we'd need to be able to say:
1) binding works for random sets of objects (even objects of different
kinds)
2) binding works for a single object of this type
3) binding works on random sets of objects of the same type
Solaris always has (2) with type=PU (or type=Core if each Core has one
PU) and optionally has (3) for NUMA node.

Another solution would be to document that this specific errno means
that you should try to bind to something smaller, likely a PU (those are
always supported when binding is supported).

Keep in mind that we recommend that you run hwloc_bitmap_singlify()
before binding. This avoids problems with tasks moving from one PU to
another inside the whole binding.

The drawback of singlify or binding to smaller on failure is that you
have to manually distributes tasks if several of them want the same
binding: Two tasks bound to a whole dual-thread single-core will be well
distributed by the OS. Two tasks bound to a single thread within this
core require you to make sure they are not bound to the same thread.

Brice