We currently have three GPU-related branches:
(1) a (old) CUDA branch that adds "cuda0", "cuda1", ... devices inside
PCI devices and then puts Core and Memory in there to describe the GPU
(2) a (new) NVML branch that adds "nvml0", "nvml1", ... devices inside
NVIDIA GPU PCI devices (the order can be different in NVML and CUDA).
This is used by batch schedulers to retrieve NVIDIA GPU locality.
(3) a (new) OpenCL branch that adds "opencl0p0", ... devices inside AMD
GPU PCI devices.
I am going to merge the basic of (1), (2) and (3) by the end of the year
so that users can easily retrieve the locality of CUDA/NVML/OpenCL
device. They'll have functions to convert the device pointer into hwloc
object, a device index into object, or a device pointer into a cpuset.
The main drawback of this is that the initialization of these libs can
be slow (about 1-2s added to lstopo since it enables I/O by default) if
poorly configured (NVIDIA puts GPGPU device in non-persistent mode by
default, and AMD GPGPU are slower if DISPLAY isn't set to :0). I will
document how to avoid such issues, not sure it's worth disabling all
this plugins by default.
Then we'll talk about the remaining part of (1) (GPU internals), I still
need to see if we can do something similar with OpenCL, find out which
numbers of compute units, SIMD units, SIMD width actually matter to
users, and if we can report all this in a somehow portable way.