George, Yes. GPUDirect eliminated an additional (host) memory buffering
step between the HCA and the GPU that took CPU cycles.
I was never very comfortable with the kernel patch necessary, nor the
patched OFED required to make it all work. Having said that, it did
provide a ~14% improvement in throughput on some of my code. Not bad.
Now comes GPUDirect 2.0 (mostly helping GPU-GPU across PCIe) and Unified
Virtual Addressing. Holds great promise, but the real understanding
comes from whitebox analysis, and instrumenting my app code.
On Wed, 2011-04-13 at 17:21 -0400, George Bosilca wrote:
> On Apr 13, 2011, at 14:48 , Rolf vandeVaart wrote:
> > This work does not depend on GPU Direct. It is making use of the fact that one can malloc memory, register it with IB, and register it with CUDA via the new 4.0 API cuMemHostRegister API. Then one can copy device memory into this memory.
> Wasn't that the point behind GPUDirect? To allow direct memory copy between the GPU and the network card without external intervention?
> devel mailing list
Kenneth A. Lloyd
CEO - Director of Systems Science
Watt Systems Technologies Inc.
This e-mail is covered by the Electronic Communications Privacy Act, 18
U.S.C. 2510-2521 and is intended only for the addressee named above. It
may contain privileged or confidential information. If you are not the
addressee you must not copy, distribute, disclose or use any of the
information in it. If you have received it in error please delete it and
immediately notify the sender.