This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
We currently have three GPU-related branches:
(1) a (old) CUDA branch that adds "cuda0", "cuda1", ... devices inside
PCI devices and then puts Core and Memory in there to describe the GPU
(2) a (new) NVML branch that adds "nvml0", "nvml1", ... devices inside
NVIDIA GPU PCI devices (the order can be different in NVML and CUDA).
This is used by batch schedulers to retrieve NVIDIA GPU locality.
(3) a (new) OpenCL branch that adds "opencl0p0", ... devices inside AMD
GPU PCI devices.
I am going to merge the basic of (1), (2) and (3) by the end of the year
so that users can easily retrieve the locality of CUDA/NVML/OpenCL
device. They'll have functions to convert the device pointer into hwloc
object, a device index into object, or a device pointer into a cpuset.
The main drawback of this is that the initialization of these libs can
be slow (about 1-2s added to lstopo since it enables I/O by default) if
poorly configured (NVIDIA puts GPGPU device in non-persistent mode by
default, and AMD GPGPU are slower if DISPLAY isn't set to :0). I will
document how to avoid such issues, not sure it's worth disabling all
this plugins by default.
Then we'll talk about the remaining part of (1) (GPU internals), I still
need to see if we can do something similar with OpenCL, find out which
numbers of compute units, SIMD units, SIMD width actually matter to
users, and if we can report all this in a somehow portable way.