Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29055 - in trunk/ompi/mca: btl btl/smcuda common/cuda pml/ob1
From: Rolf vandeVaart (rvandevaart_at_[hidden])
Date: 2013-08-23 15:53:56


Yes, I agree that the CUDA support is more intrusive and ends up in different areas. The problem is that the changes could not be simply isolated in a BTL.

1. To support the direct movement of GPU buffers, we often utilize copying into host memory and then out of host memory. These copies have to be done utilizing the cuMemcpy() functions rather than the memcpy() function. This is why some changes ended up in the opal datatype area. Must of that copying is driven by the convertor.
2. I added support for doing an asynchronous copy into and out of host buffers. This ended up touching datatype, PML, and BTL layers.
3. A GPU buffer may utilize a different protocol than a HOST buffer within a BTL. This required me to find different ways to direct which PML protocol to use. In addition, it is assume that a BTL either supports RDMA or not. There is no idea of supporting based on the type of buffer one is sending.

Therefore, to leverage much of the existing datatype and PML support, I had to make changes in there. Overall, I agree it is not ideal, but the best I could come up with.

>-----Original Message-----
>From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of George
>Bosilca
>Sent: Friday, August 23, 2013 7:36 AM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29055 - in
>trunk/ompi/mca: btl btl/smcuda common/cuda pml/ob1
>
>Rolf,
>
>On Aug 22, 2013, at 19:24 , Rolf vandeVaart <rvandevaart_at_[hidden]> wrote:
>
>> Hi George:
>>
>> The reason it tainted the PML is because the CUDA IPC support makes use
>of the large message RDMA protocol of the PML layer. The smcuda btl starts
>up, but does not initially support any large message RDMA (RGET,RPUT)
>protocols. Then when a GPU buffer is first accessed, the smcuda btl starts an
>exchange of some control messages with its peer. If they determine that
>they can support CUDA IPC, then the smcuda calls up into the PML layer and
>says it is OK to start using the large message RDMA. This all happens in code
>that is only compiled in if the user asks for CUDA-aware support.
>
>The issue is not that is compiled in only when CUDA support is enabled. The
>problem is that it is a breakage of the separation between PML and BTL.
>Almost all BTL did manage to implement highly optimized protocols under this
>design without having to taint the PML. Very similarly to CUDA I can cite CMA
>and KNEM support in the SM BTL. So I really wonder why the CUDA support is
>so different, at the point where it had to go all over the place (convertor,
>memory pool and PML)?
>
>> The key requirement was I wanted to dynamically add the support for CUDA
>IPC when the user first started accessing GPU buffers rather than during
>MPI_Init.
>
>Moving from BTL flags to endpoint based flags is indeed a good thing. This is
>something that should be done everywhere in the PML code, as it will allow
>the BTL to support different behaviors based on the peer.
>
>George.
>
>> This the best way I could figure out how to accomplish this but I am open to
>other ideas.
>
>
>
>>
>> Thanks,
>> Rolf
>>
>>> -----Original Message-----
>>> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of George
>>> Bosilca
>>> Sent: Thursday, August 22, 2013 11:32 AM
>>> To: devel_at_[hidden]
>>> Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29055 - in
>>> trunk/ompi/mca: btl btl/smcuda common/cuda pml/ob1
>>>
>>> I'm not very keen of seeing BTL modification tainting the PML. I
>>> would have expected support for IPC between GPU must be a BTL-level
>>> decision, no a special path in the PML.
>>>
>>> Is there a reason IPC support cannot be hidden down in the SMCUDA BTL?
>>>
>>> Thanks,
>>> George.
>>>
>>> On Aug 21, 2013, at 23:00 , svn-commit-mailer_at_[hidden] wrote:
>>>
>>>> Author: rolfv (Rolf Vandevaart)
>>>> Date: 2013-08-21 17:00:09 EDT (Wed, 21 Aug 2013) New Revision: 29055
>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/29055
>>>>
>>>> Log:
>>>> Fix support in smcuda btl so it does not blow up when there is no
>>>> CUDA IPC
>>> support between two GPUs. Also make it so CUDA IPC support is added
>>> dynamically.
>>>> Fixes ticket 3531.
>>>>
>>>> Added:
>>>> trunk/ompi/mca/btl/smcuda/README
>>>> Text files modified:
>>>> trunk/ompi/mca/btl/btl.h | 2
>>>> trunk/ompi/mca/btl/smcuda/README | 113
>>> ++++++++++++++++++++++
>>>> trunk/ompi/mca/btl/smcuda/btl_smcuda.c | 104
>>> ++++++++++++++++++++
>>>> trunk/ompi/mca/btl/smcuda/btl_smcuda.h | 28 +++++
>>>> trunk/ompi/mca/btl/smcuda/btl_smcuda_component.c | 200
>>> +++++++++++++++++++++++++++++++++++++++
>>>> trunk/ompi/mca/btl/smcuda/btl_smcuda_endpoint.h | 5 +
>>>> trunk/ompi/mca/common/cuda/common_cuda.c | 29 +++++
>>>> trunk/ompi/mca/common/cuda/common_cuda.h | 3
>>>> trunk/ompi/mca/pml/ob1/pml_ob1.c | 11 ++
>>>> trunk/ompi/mca/pml/ob1/pml_ob1_cuda.c | 42 ++++++++
>>>> trunk/ompi/mca/pml/ob1/pml_ob1_recvreq.c | 6
>>>> 11 files changed, 535 insertions(+), 8 deletions(-)
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> ----------------------------------------------------------------------
>> ------------- This email message is for the sole use of the intended
>> recipient(s) and may contain confidential information. Any
>> unauthorized review, use, disclosure or distribution is prohibited.
>> If you are not the intended recipient, please contact the sender by
>> reply email and destroy all copies of the original message.
>> ----------------------------------------------------------------------
>> ------------- _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/devel