Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] anybody tried OMPI with gpudirect?
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2011-03-09 11:25:53


FYI, we finally managed to get GPUDirect to work. We didn't have
gpudirect patches in our OFED kernel modules (only available for RHEL
5.4 and 5.5), we had to rebuild them for SLES11. Thanks a lot for your help.

Now it works... but it seems to hang when the shared buffer size exceeds
1MB. I don't know if there's a known limitation there.

Latency from one GPU to another GPU on another node is about 55us (pure
MPI is about 2us). Throughput is 1250MB/s (2300 for pure MPI, 850 for
GPU to GPU without GPUDirect).

Brice

Le 28/02/2011 17:30, Rolf vandeVaart a écrit :
> Hi Brice:
> Yes, I have tired OMPI 1.5 with gpudirect and it worked for me. You definitely need the patch or you will see the behavior just as you described, a hang. One thing you could try is disabling the large message RDMA in OMPI and see if that works. That can be done by adjusting the openib BTL flags.
>
> -- mca btl_openib_flags 304
>
> Rolf
>
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Brice Goglin
> Sent: Monday, February 28, 2011 11:16 AM
> To: users_at_[hidden]
> Subject: [OMPI users] anybody tried OMPI with gpudirect?
>
> Hello,
>
> I am trying to play with nvidia's gpudirect. The test program given with the gpudirect tarball just does a basic MPI ping-pong between two process that allocated their buffers with cudaHostMalloc instead of malloc. It seems to work with Intel MPI but Open MPI 1.5 hangs in the first MPI_Send. Replacing the cuda buffer with a normally-malloc'ed buffer makes the program work again. I assume that something goes wrong when OMPI tries to register/pin the cuda buffer in the IB stack (that's what gpudirect seems to be about), but I don't see why Intel MPI would succeed there.
>
> Has anybody ever looked at this?
>
> FWIW, we're using OMPI 1.5, OFED 1.5.2, Intel MPI 4.0.0.28 and SLES11 w/ and w/o the gpudirect patch.
>
> Thanks
> Brice Goglin
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may contain
> confidential information. Any unauthorized review, use, disclosure or distribution
> is prohibited. If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>