Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] GPUDirect v1 issues
From: Kenneth Lloyd (kenneth.lloyd_at_[hidden])
Date: 2012-01-18 11:06:52


It is documented in
http://developer.download.nvidia.com/compute/cuda/4_0/docs/GPUDirect_Technol
ogy_Overview.pdf

set CUDA_NIC_INTEROP=1

 

 

From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
Behalf Of Sebastian Rinke
Sent: Wednesday, January 18, 2012 8:15 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues

 

Setting the environment variable fixed the problem for Open MPI with CUDA
4.0. Thanks!

 

However, I'm wondering why this is not documented in the NVIDIA GPUDirect
package.

 

Sebastian.

 

On Jan 18, 2012, at 1:28 AM, Rolf vandeVaart wrote:

Yes, the step outlined in your second bullet is no longer necessary.

 

Rolf

 

 

From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
Behalf Of Sebastian Rinke
Sent: Tuesday, January 17, 2012 5:22 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues

 

Thank you very much. I will try setting the environment variable and if
required also use the 4.1 RC2 version.

To clarify things a little bit for me, to set up my machine with GPUDirect
v1 I did the following:

* Install RHEL 5.4
* Use the kernel with GPUDirect support
* Use the MLNX OFED stack with GPUDirect support
* Install the CUDA developer driver

Does using CUDA >= 4.0 make one of the above steps redundant?

I.e., RHEL or different kernel or MLNX OFED stack with GPUDirect support is
not needed any more?

Sebastian.

Rolf vandeVaart wrote:

I ran your test case against Open MPI 1.4.2 and CUDA 4.1 RC2 and it worked
fine. I do not have a machine right now where I can load CUDA 4.0 drivers.
Any chance you can try CUDA 4.1 RC2? There were some improvements in the
support (you do not need to set an environment variable for one)
 http://developer.nvidia.com/cuda-toolkit-41
 
There is also a chance that setting the environment variable as outlined in
this link may help you.
http://forums.nvidia.com/index.php?showtopic=200629
 
However, I cannot explain why MVAPICH would work and Open MPI would not.
 
Rolf
 
  

-----Original Message-----
From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]]
On Behalf Of Sebastian Rinke
Sent: Tuesday, January 17, 2012 12:08 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues
 
I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2.
 
Attached you find a little test case which is based on the GPUDirect v1 test
case (mpi_pinned.c).
In that program the sender splits a message into chunks and sends them
separately to the receiver which posts the corresponding recvs. It is a kind
of
pipelining.
 
In mpi_pinned.c:141 the offsets into the recv buffer are set.
For the correct offsets, i.e. increasing them, it blocks with Open MPI.
 
Using line 142 instead (offset = 0) works.
 
The tarball attached contains a Makefile where you will have to adjust
 
* CUDA_INC_DIR
* CUDA_LIB_DIR
 
Sebastian
 
On Jan 17, 2012, at 4:16 PM, Kenneth A. Lloyd wrote:
 
    

Also, which version of MVAPICH2 did you use?
 
I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using CUDA 4.1 r2)
vis MVAPICH-GPU on a small 3 node cluster. These are wickedly interesting.
 
Ken
-----Original Message-----
From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_open-
      

mpi.org]
    

On Behalf Of Rolf vandeVaart
Sent: Tuesday, January 17, 2012 7:54 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues
 
I am not aware of any issues. Can you send me a test program and I
can try it out?
Which version of CUDA are you using?
 
Rolf
 
      

-----Original Message-----
From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_open-
        

mpi.org]
    

On Behalf Of Sebastian Rinke
Sent: Tuesday, January 17, 2012 8:50 AM
To: Open MPI Developers
Subject: [OMPI devel] GPUDirect v1 issues
 
Dear all,
 
I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking
MPI_SEND/RECV to block forever.
 
For two subsequent MPI_RECV, it hangs if the recv buffer pointer of
the second recv points to somewhere, i.e. not at the beginning, in
the recv buffer (previously allocated with cudaMallocHost()).
 
I tried the same with MVAPICH2 and did not see the problem.
 
Does anybody know about issues with GPUDirect v1 using Open MPI?
 
Thanks for your help,
Sebastian
_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
        

----------------------------------------------------------------------------
-------
This email message is for the sole use of the intended recipient(s) and may
contain
confidential information. Any unauthorized review, use, disclosure or
distribution
is prohibited. If you are not the intended recipient, please contact the
sender by
reply email and destroy all copies of the original message.
----------------------------------------------------------------------------
-------
 
_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  

 

_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel