Open MPI logo

Portable Hardware Locality (hwloc)

  |   Home   |   Support   |   FAQ   |  
hwloc v1.11.1rc1 released

New bugfix release

> Read more
hwloc v1.11.0 published

New feature release

> Read more
The Best of lstopo published

Best lstopo graphical outputs

> Read more
Network Locality (netloc)

New hwloc companion

> Read more

The Portable Hardware Locality (hwloc) software package provides a portable abstraction (across OS, versions, architectures, ...) of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading. It also gathers various system attributes such as cache and memory information as well as the locality of I/O devices such as network interfaces, InfiniBand HCAs or GPUs. It primarily aims at helping applications with gathering information about modern computing hardware so as to exploit it accordingly and efficiently.

The democratization of multicore processors and NUMA architectures leads to the spreading of complex hardware topologies into the whole server world. Nodaways every single cluster node may contain tens of cores, hierarchical caches, and multiple memory nodes, making its topology far from flat. Such complex and hierarchical topologies have strong impact of the application performance. The developer must take hardware affinities into account when trying to exploit the actual hardware performance. For instance, two tasks that tightly cooperate should probably rather be placed onto cores sharing a cache. However, two independent memory-intensive tasks should better be spread out onto different sockets so as to maximize their memory throughput. As described in this paper, OpenMP threads have to be placed according to their affinities and to the hardware characteristics. MPI implementations apply similar techniques while also adapting their communication strategies to the network locality as described in this paper or this one.

Portability and support

hwloc supports the following operating systems:

  • Linux (including old kernels not having sysfs topology information, with knowledge of cgroups, offline CPUs, ScaleMP vSMP, and NumaScale NumaConnect) on all supported hardware, including Intel Xeon Phi.
  • Solaris
  • AIX
  • Darwin / OS X
  • FreeBSD and its variants (such as kFreeBSD/GNU)
  • NetBSD
  • OSF/1 (a.k.a., Tru64)
  • HP-UX
  • Microsoft Windows (either using MinGW or a native Visual Studio solution)
  • IBM BlueGene/Q Compute Node Kernel (CNK)

Additionally hwloc can detect the locality PCI devices as well as OpenCL, CUDA and Xeon Phi accelerators, network and InfiniBand interfaces, etc. See the Best of lstopo for more examples of supported platforms.

Since it uses standard Operating System information, hwloc's support is almost always independent from the processor type (x86, powerpc, ia64, ...), and just relies on the Operating System support. Whenever the OS does not support topology information (e.g. some BSDs), hwloc uses an x86-only CPUID-based backend.

To check whether hwloc works on a particular machine, just try to build it and run lstopo or lstopo-no-graphics. If some things do not look right (e.g. bogus or missing cache information), see Questions and bugs below

hwloc may display the topology in multiple convenient formats (see v1.11.0 examples and the Best of lstopo). It also offers a powerful programming interface to gather information about the hardware, bind processes, and much more.


More details are available in the Documentation (in both PDF and HTML). The documentation for each version contains examples of outputs and an API interface example (these links are for v1.11.0).

The materials from several hwloc tutorials is available online.

Getting and using hwloc

The latest hwloc releases are available on the download page. The GIT repository is also accessible for online browsing or checkout.

hwloc is already available as official packages for many Linux distributions (at least Debian, Ubuntu, ArchLinux, Fedora, RHEL), as well as Cygwin and Mac OS X ports.

Perl bindings are available from Bernd Kallies on CPAN.
Python bindings are available from Guy Streeter as Fedora RPM and tarball or within their git tree (html).

The following software already benefit from hwloc or are being ported to it:

How do you pronounce "hwloc"?

When in doubt, say "hardware locality."

Some of the core developers say "H. W. Loke"; others say "H. W. Lock". We've heard several other pronunciations as well. We don't really have a strong preference for how you say it; we chose the name for its Google-ability, not its pronunciation.

But now at least you know how we pronounce it. :-)

Questions and bugs

Questions, comments, and bugs should be sent to hwloc mailing lists. When appropriate, please attach the /proc + /sys tarball generated by the installed script hwloc-gather-topology when submitting problems about Linux, or send the output of kstat cpu_info in the Solaris case, or the output of sysctl hw in the Darwin or BSD cases. Also make sure you run a recent OS (e.g. Linux kernel) and possibly a recent BIOS too since hwloc gathers topology information from them. Passing --enable-debug to ./configure also enables a lot of helpful debugging information.

Also be sure to see the hwloc wiki and bug tracking system.


If you are looking for general-purpose hwloc citations, please use the following one. This paper (available here) introduces hwloc, its goals and its implementation. It then shows how hwloc may be used by MPI implementations and OpenMP runtime systems as a way to carefully place processes and adapt communication strategies to the underlying hardware.

François Broquedis, Jérôme Clet-Ortega, Stéphanie Moreaud, Nathalie Furmento, Brice Goglin, Guillaume Mercier, Samuel Thibault, and Raymond Namyst. hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications. In Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010), Pisa, Italia, February 2010. IEEE Computer Society Press.

If you are looking for a citation about I/O device locality and cluster/multi-node support, please use the following one instead. This paper (available here) explains how I/O locality is managed in hwloc, how device details are represented, how hwloc interacts with other libraries, and how multiple nodes such as a cluster can be efficiently managed.

Brice Goglin. Managing the Topology of Heterogeneous Cluster Nodes with Hardware Locality (hwloc). In Proceedings of 2014 International Conference on High Performance Computing & Simulation (HPCS 2014), Bologna, Italy, July 2014.

See also the Open MPI publication list.

History / credits

hwloc is the evolution and merger of the libtopology and Portable Linux Processor Affinity (PLPA) projects. Because of functional and ideological overlap, these two code bases and ideas were merged and released under the name "hwloc" as an Open MPI sub-project. hwloc is now mostly developed by the TADaaM team at Inria (Bordeaux, France).

libtopology was initially developed by the Inria Runtime team-project as a way to discover hardware affinities inside the Marcel threading library. With the advent of multicore machines, this work became interesting for much more than multithreading. So libtopology was extracted from Marcel and became an independent library.

Portability tests are performed thanks to the Inria Continuous Integration platform.