Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] hwloc tutorial material
From: Erik Schnetter (schnetter_at_[hidden])
Date: 2013-01-22 09:26:30


My name is not Kenneth, but I won't forego the opportunity to describe the
needs of MY application (Cactus)...

Currently, our CUDA functionality is more efficient, but our OpenCL
functionality is more mature. We would like to use hwloc to obtain the
following information for GPUs, as we already do for CPUs:

- number of cores
- number of PUs per core ("hardware threads"); both for choosing good
numbers of threads, and for deciding how "close" they should be in terms of
memory they access. (Neither OpenMP nor OpenCL distinguish between
multi-core threading and SMT.)

- cache size of L1, or L2 cache if L1 cache is small
- cache line size (for array padding)
- cache stride (or associativity) for memory allocation

- fastest core / fastest NUMA node from which a GPU can be accessed

To date, we collect some of this information in a "database" with one entry
per system that we are using. This works well for development, but in the
end, we need to collect this information automatically.

-erik

On Tue, Jan 22, 2013 at 7:15 AM, Brice Goglin <Brice.Goglin_at_[hidden]> wrote:

> Le 22/01/2013 10:27, Samuel Thibault a écrit :
> > Kenneth A. Lloyd, le Mon 21 Jan 2013 22:46:37 +0100, a écrit :
> >> Thanks for making this tutorial available. Using hwloc 1.7, how far
> down
> >> into, say, NVIDIA cards can the architecture be reflected? Global
> memory
> >> size? SMX cores? None of the above?
> > None of the above for now. Both are available in the cuda svn branch,
> > however.
> >
>
> Now the question to Kenneth is "what do YOU need?"
>
> I didn't merge the GPU internals into the trunk yet because I'd like to
> see if that matches what we would do with OpenCL and other accelerators
> such as the Xeon Phi.
>
> One thing is keep in mind is that most hwloc/GPU users will use hwloc to
> get locality information but they will also still use CUDA to use the
> GPU. So they will still be able to use CUDA to get in-depth GPU
> information anyway. Then the question is how much CUDA info do we want
> to duplicate in hwloc. hwloc could have the basic/uniform GPU
> information and let users rely on CUDA for everything CUDA-specific for
> instance. Right now, the basic/uniform part is almost empty (just
> contain the GPU model name or so).
>
> Also the CUDA branch creates hwloc objects inside the GPU to describe
> the memory/cores/caches/... Would you use these objects in your
> application ? or would you rather just have a basic GPU attribute
> structure containing the number of SMX, the memory size, ... One problem
> with this is that it may be hard to define a structure that works for
> all GPUs, even only the NVIDIA ones. We may need an union of structs...
>
> I am talking about "your application" above because having lstopo draw
> very nice GPU internals doesn't mean the corresponding hwloc objects are
> useful to real application.
>
> Brice
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>

-- 
Erik Schnetter <schnetter_at_[hidden]>
http://www.perimeterinstitute.ca/personal/eschnetter/