I don't have much to add, but +1. :)
On Dec 19, 2012, at 12:58 PM, Brice Goglin wrote:
> We currently have three GPU-related branches:
> (1) a (old) CUDA branch that adds "cuda0", "cuda1", ... devices inside
> PCI devices and then puts Core and Memory in there to describe the GPU
> (2) a (new) NVML branch that adds "nvml0", "nvml1", ... devices inside
> NVIDIA GPU PCI devices (the order can be different in NVML and CUDA).
> This is used by batch schedulers to retrieve NVIDIA GPU locality.
> (3) a (new) OpenCL branch that adds "opencl0p0", ... devices inside AMD
> GPU PCI devices.
> I am going to merge the basic of (1), (2) and (3) by the end of the year
> so that users can easily retrieve the locality of CUDA/NVML/OpenCL
> device. They'll have functions to convert the device pointer into hwloc
> object, a device index into object, or a device pointer into a cpuset.
> The main drawback of this is that the initialization of these libs can
> be slow (about 1-2s added to lstopo since it enables I/O by default) if
> poorly configured (NVIDIA puts GPGPU device in non-persistent mode by
> default, and AMD GPGPU are slower if DISPLAY isn't set to :0). I will
> document how to avoid such issues, not sure it's worth disabling all
> this plugins by default.
> Then we'll talk about the remaining part of (1) (GPU internals), I still
> need to see if we can do something similar with OpenCL, find out which
> numbers of compute units, SIMD units, SIMD width actually matter to
> users, and if we can report all this in a somehow portable way.
> hwloc-devel mailing list
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/