|hwloc v2.0.4 released|
> Read more
|hwloc v1.11.13 released|
> Read more
|Upgrading to v2.0 API|
Guide for Porting your Code
> Read more
|The Best of lstopo|
Best lstopo graphical outputs
> Read more
|Network Locality (netloc)|
New hwloc companion
> Read more
The Portable Hardware Locality (hwloc) software package provides a
portable abstraction (across OS, versions, architectures, ...) of the
hierarchical topology of modern architectures, including NUMA memory
nodes, sockets, shared caches, cores and simultaneous
multithreading. It also gathers various system attributes such as
cache and memory information as well as the locality of I/O devices
such as network interfaces, InfiniBand HCAs or GPUs.
hwloc primarily aims at helping
applications with gathering information about increasingly complex
parallel computing platforms so as to exploit them accordingly and efficiently.
For instance, two tasks that tightly cooperate
should probably be placed onto cores sharing a cache.
However, two independent memory-intensive tasks should better be spread out
onto different sockets so as to maximize their memory throughput.
As described in this paper,
OpenMP threads have to be placed according to their affinities and to the
MPI implementations apply similar techniques while also adapting their
communication strategies to the network locality as described in
or this one.
hwloc may also help many applications just by providing
a portable CPU and memory binding API
and a reliable way to
find out how many cores and/or hardware threads are available.
Portability and support
hwloc supports the following operating systems:
- Linux (including old kernels not having sysfs topology
information, with knowledge of cgroups, offline CPUs, ScaleMP vSMP,
and NumaScale NumaConnect) on all supported hardware,
including Intel Xeon Phi.
- Solaris, AIX and HP-UX
- NetBSD, FreeBSD and kFreeBSD/GNU
- Darwin / OS X
- Microsoft Windows (either using MinGW, or Cygwin, or a native Visual Studio solution)
- IBM BlueGene/Q Compute Node Kernel (CNK)
Additionally hwloc can detect the locality PCI devices as well as OpenCL,
CUDA and Xeon Phi accelerators, network and InfiniBand interfaces,
See the Best of lstopo for more examples of supported platforms.
Since it uses standard Operating System information, hwloc's support is
almost always independent from the processor type (x86, powerpc, ia64, ...),
and just relies on the Operating System support. Whenever the OS does not
support topology information (e.g. some BSDs), hwloc uses an x86-only
To check whether hwloc works on a particular machine, just try to build
it and run lstopo or lstopo-no-graphics.
If some things do not look right (e.g. bogus or
missing cache information), see Questions and bugs below
hwloc may display the topology in multiple convenient formats (see
v2.0.4 examples and the Best of lstopo).
It also offers a powerful programming interface to gather information
about the hardware, bind processes, and much more.
More details are available in the Documentation
(in both PDF and HTML). The documentation for each version contains
outputs and an API interface example (these links are for v2.0.4).
The materials from several hwloc tutorials are
Getting and using hwloc
hwloc is open-source, available under the
The latest hwloc releases are available on the
The GIT repository is also accessible for
hwloc is already available as official packages for many Linux distributions
(at least Debian/Ubuntu, Fedora/RHEL, SUSE, ArchLinux, Slackware, Gentoo and their derivatives),
as well as NetBSD, FreeBSD, Cygwin, Mac OS X ports, and HP-UX.
It is also available as EasyBuild and Spack packages.
The following langages also have dedicated bindings:
The following software already benefit from hwloc or are being
ported to it:
- MPI implementations and tools
- Runtime systems and compilers
- Parallel scientific libraries and toolkits
- Resource manager and job schedulers
- and even more!
Questions and bugs
Bugs should be reported in
Opening a new issue automatically displays lots of hints about
how to debug and report issues.
See also the
wiki page about Linux kernel bugs (or BIOS bugs) affecting locality information in hwloc.
Questions may be sent to the users or developers
There is also a #hwloc IRC channel on Freenode (irc.freenode.net).
For a general-purpose hwloc citations, please use the following one.
This paper introduces hwloc, its goals and its implementation.
It then shows how hwloc may be used by MPI implementations and OpenMP
runtime systems as a way to carefully place processes and adapt communication
strategies to the underlying hardware.
François Broquedis, Jérôme Clet-Ortega, Stéphanie Moreaud, Nathalie Furmento, Brice Goglin, Guillaume Mercier, Samuel Thibault, and Raymond Namyst.
hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications.
In Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010),
Pisa, Italia, February 2010.
IEEE Computer Society Press.
For citing how hwloc deals with new heterogeneous memory hierarchies
(Knights Landing's MCDRAM, high-bandwidth memory (HBM), non-volatile memory (NVDIMM), etc),
use this paper:
Exposing the Locality of Heterogeneous Memory Architectures to HPC Applications.
In Proceedings of the First ACM International Symposium on Memory Systems (MEMSYS16),
Washington, DC, USA, October 2016.
When discussing the overhead of topology discovery and why XML or synthetic topologies are useful, use this paper:
On the Overhead of Topology Discovery for Locality-aware Scheduling in HPC.
In Proceedings of the 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2017),
St Petersburg, Russia, March 2017.
About the memory footprint of hwloc and the new shmem topology API in hwloc 2.0:
Memory Footprint of Locality Information on Many-Core Platforms.
In Proceedings of the 6th Workshop on Runtime and Operating Systems for the Many-core Era (ROME 2018), held in conjunction with IPDPS,
Vancouvert, BC, Canada, May 2018.
For citing hwloc's I/O device locality and cluster/multi-node support, please use the following one instead.
This paper explains how I/O locality is managed in hwloc, how device details are represented,
how hwloc interacts with other libraries, and how multiple nodes such as a cluster can be efficiently managed.
Managing the Topology of Heterogeneous Cluster Nodes with Hardware Locality (hwloc).
In Proceedings of 2014 International Conference on High Performance Computing & Simulation (HPCS 2014),
Bologna, Italy, July 2014.
For citing hwloc's hierarchical modeling of computing, memory and I/O resources as well as multi-node support,
use this paper:
Towards the Structural Modeling of the Topology of next-generation heterogeneous cluster Nodes with hwloc.
Inria, November 2016.
History / credits
hwloc is the evolution and merger of the libtopology and
Portable Linux Processor Affinity (PLPA) projects.
Because of functional and ideological overlap, these two code bases and ideas
were merged and released under the name "hwloc" as an Open MPI sub-project.
hwloc is now mostly developed by the TADaaM team at Inria (Bordeaux, France).
libtopology was initially developed by the Inria Runtime team-project
as a way to discover hardware affinities inside the Marcel threading library.
With the advent of multicore machines, this work became interesting for much more than multithreading.
So libtopology was extracted from Marcel and became an independent library.
Portability tests are performed thanks to
the Inria Continuous Integration platform.
How do you pronounce "hwloc"?
When in doubt, say "hardware locality."
Some of the core developers say "H. W. Loke"; others say
"H. W. Lock". We've heard several other pronunciations as well. We
don't really have a strong preference for how you say it; we
chose the name for its Google-ability, not its pronunciation.
But now at least you know how we pronounce it. :-)