Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] hwloc-distrib: how to start at lower hiearchy level?
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2010-07-05 01:12:58

Le 05/07/2010 02:41, Jirka Hladky a écrit :
> Hi all,
> I'm using hwloc-distrib quite often to distribute jobs optimally on NUMA
> boxes. I use it to test linux kernel task - scheduler by comparing runtime of
> jobs bound to best possible CPU configuration (keeping CPU cache in mind) with
> runs without CPU affinity set.
> I just run into strange issue on box with newest Intel's Nehalem CPUs. There
> are 4 Sockets, each with 8 physical cores and hyper-threading enabled, which
> gives you 64 OS processors.
> The box has strange NUMA layout - I will need to check why it is so.
> Basically, there are 3 NUMA nodes - one includes 2 Sockets, other 2 have one
> Socket associated to each of it.

Seems strange to me, likely a BIOS bug. Nehalem-EX should always have
one NUMA node per socket from what I understand.

> hwloc-distrib --single 8 will distribute jobs in the following way:
> 3 jobs on NUMANode #0
> 3 jobs on NUMANode #1
> 2 jobs on NUMANode #2
> lstopo 64.pdf
> for A in $(hwloc-distrib --single 8); do taskset ${A} sleep 100 & done
> lstopo --top top.pdf
> hwloc-distrib does it in fact right but this is not what I want. It's not the
> best configuration when you consider CPU cache!

I think it's expected. You have a asymmetric topology. One NUMA node
with 16 cores and two nodes with 8 cores. It's quite a mess for
hwloc-distrib to handle that. And we actually have a trac ticket about
this, but I didn't think it would be that useful :/

> I have figured-out following way how to tell hwloc-distrib to avoid using
> NUMANodes when computing CPU distribution:
> lstopo --ignore NUMANode No_NUMA.xml
> for A in $(hwloc-distrib --xml No_NUMA.xml --single 8); do taskset ${A} sleep
> 100 & done
> lstopo --top fix.pdf
> I'm wondering if there is a better way how to make "Socket" the top object.
> Something like:
> hwloc-distrib --ignore NUMANode --single 8
> or
> hwloc-distrib --top_level Socket --single 8
> would be very useful. Is there something like this already? If not would you
> consider this as an enhancement?

Indeed, such an option would be an easy way to work around problems with
asymmetric topologies. I don't know yet whether --ignore or --top-level
is better. I'll think about it.