Open MPI logo

Portable Hardware Locality (hwloc) Documentation: v1.7.2

  |   Home   |   Support   |   FAQ   |  
I/O Devices

hwloc usually manipulates processing units and memory but it can also discover I/O devices and report their locality as well. This is useful for placing I/O intensive applications on cores near the I/O devices they use.

Enabling and requirements

I/O discovery is disabled by default (except in lstopo) so as not to break legacy application by adding unexpected I/O objects to the topology. It can be enabled by passing flags such as HWLOC_TOPOLOGY_FLAG_IO_DEVICES to hwloc_topology_set_flags() before loading the topology.

Note that I/O discovery requires significant help from the operating system. The pciaccess library (the development package is usually libpciaccess-devel or libpciaccess-dev) is needed to detect PCI devices and bridges (libpci/pciutils may be used instead if a GPL dependency is acceptable), and the actual locality of these devices is only currently detected on Linux. Also, some operating systems require privileges for probing PCI devices, see Does hwloc require privileged access? for details.

I/O object hierarchy

When I/O discovery is enabled and supported, some additional objects (types HWLOC_OBJ_BRIDGE, HWLOC_OBJ_PCI_DEVICE and HWLOC_OBJ_OS_DEVICE) are added to the topology as a child of the object they are close to. For instance, if a I/O Hub is connected to a socket, the corresponding hwloc bridge object (and its PCI bridges and devices children) is inserted as a child of the corresponding hwloc socket object.

These new objects have neither CPU sets nor node sets (NULL pointers) because they are not directly usable by the user applications. Moreover I/O hierarchies may be highly complex (asymmetric trees of bridges). So I/O objects are placed in specific levels with custom depths. Their lists may still be traversed with regular helpers such as hwloc_get_next_obj_by_type(). However, hwloc offers some dedicated helpers such as hwloc_get_next_pcidev() and hwloc_get_next_osdev() for convenience (see Advanced I/O object traversal helpers).

An I/O hierarchy is organized as follows: A hostbridge object ( HWLOC_OBJ_BRIDGE object with upstream type Host and downstream type PCI) is attached below a regular object (usually the entire machine or a NUMA node). There may be multiple hostbridges in the machine, attached to different places, but all I/O devices are below one of them. Each hostbridge contains one or several children, either other bridges (usually PCI to PCI) or PCI devices (HWLOC_OBJ_PCI_DEVICE). The number of bridges between the hostbridge and a PCI device depends on the machine and on the topology flags.

Software devices

Although each PCI device is uniquely identified by its bus ID (e.g. 0000:01:02.3), the application can hardly find out which PCI device is actually used when manipulating software handle (such as the eth0 network interface, the sda hard drive, or the mlx4_0 OpenFabrics HCA). Therefore hwloc tries to add software devices (HWLOC_OBJ_OS_DEVICE, also known as OS devices) below their PCI objects.

hwloc first tries to discover the corresponding names, e.g. eth0, sda or mlx4_0, from the operating system. However, this ability is currently only available on Linux for some classes of devices.

hwloc then tries to discover software devices through additional I/O components using external libraries. For instance proprietary graphics drivers do not offer any OS name, but hwloc may still create one OS object per software handle when supported. For instance the opencl and cuda components may add some opencl0d0 and cuda0 OS device objects.

Here is a list of OS device objects commonly created by hwloc components when I/O discovery is enabled and supported.

  • Hard disks (HWLOC_OBJ_OSDEV_BLOCK)
    • sda (Linux component)
  • Network interfaces (HWLOC_OBJ_OSDEV_NETWORK)
    • eth0, wlan0, ib0 (Linux component)
  • OpenFabrics HCAs (HWLOC_OBJ_OSDEV_OPENFABRICS)
    • mlx4_0, qib0 (Linux component)
  • GPUs (HWLOC_OBJ_OSDEV_GPU)
    • nvml0 for the first NVML device (NVML component, using the NVIDIA Management Library)
    • :0.0 for the first display (GL component, using the NV-CONTROL X extension library, NVCtrl)
  • Co-Processors (HWLOC_OBJ_OSDEV_COPROC)
    • opencl0d0 for the first device of the first OpenCL platform, opencl1d3 for the fourth device of the second OpenCL platform (OpenCL component)
    • cuda0 for the first NVIDIA CUDA device (CUDA component, using the NVIDIA CUDA Library)
    • mic0 for the first Intel Xeon Phi (MIC) coprocessor (Linux component)
  • DMA engine channel (HWLOC_OBJ_OSDEV_DMA)
    • dma0chan0 (Linux component)

When none of the above strategies is supported and enabled, hwloc cannot place any OS object inside PCI objects. Note that some PCI devices may contain multiple software devices (see the example below).

See also Interoperability With Other Software for managing these devices without considering them as hwloc objects.

Consulting I/O devices and binding

I/O devices may be consulted by traversing the topology manually (with usual routines such as hwloc_get_obj_by_type()) or by using dedicated helpers (such as hwloc_get_pcidev_by_busid(), see Advanced I/O object traversal helpers).

I/O objects do not actually contain any locality information because their CPU sets and node sets are NULL. Their locality must be retrieved by walking up the object tree (through the parent link) until an non-I/O object is found (see hwloc_get_non_io_ancestor_obj()). This regular object should have non-NULL CPU sets and node sets which describe the processing units and memory that are immediately close to the I/O device. For instance the path from a OS device to its locality may go across a PCI device parent, one or several bridges, up to a a NUMA node with the same locality.

Command-line tools are also aware of I/O devices. lstopo displays the interesting ones by default (passing –no-io disables it).

hwloc-calc and hwloc-bind may manipulate I/O devices specified by PCI bus ID or by OS device name.

  • pci=0000:02:03.0 is replaced by the set of CPUs that are close to the PCI device whose bus ID is given.
  • os=eth0 is replaced by CPUs that are close to the I/O device whose software handle is called eth0.

This enables easy binding of I/O-intensive applications near the device they use.

Examples

The following picture shows a dual-socket dual-core host whose PCI bus is connected to the first socket and NUMA node.

devel09-pci.png

Six interesting PCI devices were discovered. However hwloc found some corresponding software devices (eth0, eth1, sda, mlx4_0, ib0, and ib1) for only four of these physical devices. The other ones (PCI 102b:0532 and PCI 8086:3a20) are an unused IDE controller (no disk attached) and a graphic card (no corresponding software device reported to the user by the operating system).

On the contrary, it should be noted three different software devices were found for the last PCI device (PCI 15b3:634a). Indeed this OpenFabrics HCA PCI device object contains one one OpenFabrics software device (mlx4_0) and two virtual network interface software devices (ib0 and ib1).

PCI link speed is also reported for some bridges and devices because lstopo was privileged when it discovered the topology.

Here is the corresponding textual output:

Machine (24GB)
  NUMANode L#0 (P#0 12GB)
    Socket L#0 + L3 L#0 (8192KB)
      L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
      L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 + PU L#1 (P#2)
    HostBridge
      PCIBridge
        PCI 14e4:163b
          Net "eth0"
        PCI 14e4:163b
          Net "eth1"
      PCIBridge
        PCI 1000:0060
          Block "sda"
      PCIBridge
        PCI 102b:0532
      PCI 8086:3a20
      PCI 15b3:634a
        Net "ib0"
        Net "ib1"
        Net "mlx4_0"
  NUMANode L#1 (P#1 12GB) + Socket L#1 + L3 L#1 (8192KB)
    L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2 + PU L#2 (P#1)
    L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3 + PU L#3 (P#3)