Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Second Try: Add support to send/receive CUDA device memory directly
From: Rolf vandeVaart (rvandevaart_at_[hidden])
Date: 2011-04-22 08:38:31


And here are the final proposed changes. When not configured in, these changes do not change the code at all.

https://bitbucket.org/rolfv/ompi-trunk-cuda-rfc3

Rolf

-----Original Message-----
From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On Behalf Of Jeff Squyres
Sent: Thursday, April 21, 2011 6:19 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] RFC: Second Try: Add support to send/receive CUDA device memory directly

George -- what say you?

On Apr 19, 2011, at 4:54 PM, Rolf vandeVaart wrote:

> Forgot the link...
>
> https://bitbucket.org/rolfv/ompi-trunk-cuda-rfc2
>
>
>
> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On Behalf Of Rolf vandeVaart
> Sent: Tuesday, April 19, 2011 4:45 PM
> To: Open MPI Developers
> Subject: [OMPI devel] RFC: Second Try: Add support to send/receive CUDA device memory directly
>
> WHAT: Second try to add support to send data directly from CUDA device memory via MPI calls.
>
> TIMEOUT: 4/26/2011
>
> DETAILS: Based on all the feedback (thanks to everyone who looked at it), I have whittled down what I hope to accomplish with this RFC. There were suggestions to better modularize the CUDA registration code so I will take a look at that separately. Since the registration code is a performance feature, it will be dropped from this RFC and investigated separately. This significantly reduced the changes being proposed here. With this RFC, all the changes are isolated in datatype and convertor code. As mentioned before, the changes mostly boil down to replacing memcpy with cuMemcpy when moving the data to or from a CUDA device buffer.
>
> Per suggestions, the choice to disable the large memory RDMA now happens on a per message basis. This is done by adding a flag to the convertor which tells the BTLs that an intermediate buffer is needed when dealing with device memory.
>
> As before, this code would be enabled via a configure option. A mostly completed version is viewable on bitbucket although I know the configure code is sorely lacking.
>
> This is the new list of changed files.
>
> M opal/config/opal_configure_options.m4
> A opal/datatype/opal_datatype_cuda.c
> A opal/datatype/opal_datatype_cuda.h
> M opal/datatype/opal_convertor.h
> M opal/datatype/opal_datatype_copy.c
> M opal/datatype/opal_datatype_unpack.c
> M opal/datatype/Makefile.am
> M opal/datatype/opal_datatype_pack.h
> M opal/datatype/opal_convertor.c
> M opal/datatype/opal_datatype_unpack.h
> M ompi/mca/pml/ob1/pml_ob1_sendreq.h
>
> Thanks,
> Rolf
> This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel