Can we have some more details on how this will improve performance of sending GPU device memory? I fail to see how registering the backend shared memory file with CUDA is supposed to do anything at all, as this memory is internal to Open MPI and not supposed to be visible at any other level.
On Jul 28, 2011, at 23:52 , Rolf vandeVaart wrote:
> DETAILS: In order to improve performance of sending GPU device memory,
> we need to register the host memory with the CUDA framework. These
> changes allow that to happen. These changes are somewhat different
> from what I proposed a while ago and I think a lot cleaner. There is
> a new memory pool flag that indicates whether a piece of memory
> should be registered. This allows us to register the sm memory and
> the pre-posted openib memory.