Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] GPUDirect v1 issues
From: Sebastian Rinke (s.rinke_at_[hidden])
Date: 2012-01-17 17:21:56


Thank you very much. I will try setting the environment variable and if
required also use the 4.1 RC2 version.

To clarify things a little bit for me, to set up my machine with
GPUDirect v1 I did the following:

* Install RHEL 5.4
* Use the kernel with GPUDirect support
* Use the MLNX OFED stack with GPUDirect support
* Install the CUDA developer driver

Does using CUDA >= 4.0 make one of the above steps redundant?

I.e., RHEL or different kernel or MLNX OFED stack with GPUDirect support
is not needed any more?

Sebastian.

Rolf vandeVaart wrote:
> I ran your test case against Open MPI 1.4.2 and CUDA 4.1 RC2 and it worked fine. I do not have a machine right now where I can load CUDA 4.0 drivers.
> Any chance you can try CUDA 4.1 RC2? There were some improvements in the support (you do not need to set an environment variable for one)
> http://developer.nvidia.com/cuda-toolkit-41
>
> There is also a chance that setting the environment variable as outlined in this link may help you.
> http://forums.nvidia.com/index.php?showtopic=200629
>
> However, I cannot explain why MVAPICH would work and Open MPI would not.
>
> Rolf
>
>
>> -----Original Message-----
>> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]]
>> On Behalf Of Sebastian Rinke
>> Sent: Tuesday, January 17, 2012 12:08 PM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] GPUDirect v1 issues
>>
>> I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2.
>>
>> Attached you find a little test case which is based on the GPUDirect v1 test
>> case (mpi_pinned.c).
>> In that program the sender splits a message into chunks and sends them
>> separately to the receiver which posts the corresponding recvs. It is a kind of
>> pipelining.
>>
>> In mpi_pinned.c:141 the offsets into the recv buffer are set.
>> For the correct offsets, i.e. increasing them, it blocks with Open MPI.
>>
>> Using line 142 instead (offset = 0) works.
>>
>> The tarball attached contains a Makefile where you will have to adjust
>>
>> * CUDA_INC_DIR
>> * CUDA_LIB_DIR
>>
>> Sebastian
>>
>> On Jan 17, 2012, at 4:16 PM, Kenneth A. Lloyd wrote:
>>
>>
>>> Also, which version of MVAPICH2 did you use?
>>>
>>> I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using CUDA 4.1 r2)
>>> vis MVAPICH-GPU on a small 3 node cluster. These are wickedly interesting.
>>>
>>> Ken
>>> -----Original Message-----
>>> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_open-
>>>
>> mpi.org]
>>
>>> On Behalf Of Rolf vandeVaart
>>> Sent: Tuesday, January 17, 2012 7:54 AM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] GPUDirect v1 issues
>>>
>>> I am not aware of any issues. Can you send me a test program and I
>>> can try it out?
>>> Which version of CUDA are you using?
>>>
>>> Rolf
>>>
>>>
>>>> -----Original Message-----
>>>> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_open-
>>>>
>> mpi.org]
>>
>>>> On Behalf Of Sebastian Rinke
>>>> Sent: Tuesday, January 17, 2012 8:50 AM
>>>> To: Open MPI Developers
>>>> Subject: [OMPI devel] GPUDirect v1 issues
>>>>
>>>> Dear all,
>>>>
>>>> I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking
>>>> MPI_SEND/RECV to block forever.
>>>>
>>>> For two subsequent MPI_RECV, it hangs if the recv buffer pointer of
>>>> the second recv points to somewhere, i.e. not at the beginning, in
>>>> the recv buffer (previously allocated with cudaMallocHost()).
>>>>
>>>> I tried the same with MVAPICH2 and did not see the problem.
>>>>
>>>> Does anybody know about issues with GPUDirect v1 using Open MPI?
>>>>
>>>> Thanks for your help,
>>>> Sebastian
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may contain
> confidential information. Any unauthorized review, use, disclosure or distribution
> is prohibited. If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>