Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Program hangs when using OpenMPI and CUDA
From: Rolf vandeVaart (rvandevaart_at_[hidden])
Date: 2011-06-06 10:44:29

Hi Fengguang:

That is odd that you see the problem even when running with the openib flags set as Brice indicated. Just to be extra sure there are no typo errors in your flag settings, maybe you can verify with the ompi_info command like this?

ompi_info -mca btl_openib_flags 304 -param btl openib | grep btl_openib_flags

When running with the 304 setting, then all communications travel through a regular send/receive protocol on IB. The message is broken up into a 12K fragment, followed by however many 64K fragments it takes to move the message.

I will try and find to time to reproduce the other 1 Mbyte issue that Brice reported.


PS: Not sure if you are interested, but in the trunk, you can configure in support so that you can send and receive GPU buffers directly. There are still many performance issues to be worked out, but just thought I would mention it.

-----Original Message-----
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Fengguang Song
Sent: Sunday, June 05, 2011 9:54 AM
To: Open MPI Users
Subject: Re: [OMPI users] Program hangs when using OpenMPI and CUDA

Hi Brice,

Thank you! I saw your previous discussion and actually have tried "--mca btl_openib_flags 304".
It didn't solve the problem unfortunately. In our case, the MPI buffer is different from the cudaMemcpy buffer and we do manually copy between them. I'm still trying to figure out how to configure OpenMPI's mca parameters to solve the problem...


On Jun 5, 2011, at 2:20 AM, Brice Goglin wrote:

> Le 05/06/2011 00:15, Fengguang Song a écrit :
>> Hi,
>> I'm confronting a problem when using OpenMPI 1.5.1 on a GPU cluster.
>> My program uses MPI to exchange data between nodes, and uses cudaMemcpyAsync to exchange data between Host and GPU devices within a node.
>> When the MPI message size is less than 1MB, everything works fine.
>> However, when the message size is > 1MB, the program hangs (i.e., an MPI send never reaches its destination based on my trace).
>> The issue may be related to locked-memory contention between OpenMPI and CUDA.
>> Does anyone have the experience to solve the problem? Which MCA
>> parameters should I tune to increase the message size to be > 1MB (to avoid the program hang)? Any help would be appreciated.
>> Thanks,
>> Fengguang
> Hello,
> I may have seen the same problem when testing GPU direct. Do you use
> the same host buffer for copying from/to GPU and for sending/receiving
> on the network ? If so, you need a GPUDirect enabled kernel and
> mellanox drivers, but it only helps before 1MB.
> You can work around the problem with one of the following solution:
> * add --mca btl_openib_flags 304 to force OMPI to always send/recv
> through an intermediate (internal buffer), but it'll decrease
> performance before 1MB too
> * use different host buffers for the GPU and the network and manually
> copy between them
> I never got any reply from NVIDIA/Mellanox/here when I reported this
> problem with GPUDirect and messages larger than 1MB.
> Brice
> _______________________________________________
> users mailing list
> users_at_[hidden]

users mailing list
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.