Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bug in MPI_REDUCE in CUDA-aware MPI
From: Peter Zaspel (zaspel_at_[hidden])
Date: 2013-12-02 08:29:07


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Rolf,

OK, I didn't know that. Sorry.

Yes, it would be a pretty important feature in cases when you are doing
reduction operations on many, many entries in parallel. Therefore, each
reduction is not very complex or time-consuming but potentially hundreds
of thousands reductions are done at the same time. This is definitely a
point where a CUDA-aware implementation can give some performance
improvements.

I'm curious: Rather complex operations like allgatherv are CUDA-aware,
but a reduction is not. Is there a reasoning for this? Is there some
documentation, which MPI calls are CUDA-aware and which not?

Best regards

Peter

On 12/02/2013 02:18 PM, Rolf vandeVaart wrote:
> Thanks for the report. CUDA-aware Open MPI does not currently support doing reduction operations on GPU memory.
> Is this a feature you would be interested in?
>
> Rolf
>
>> -----Original Message-----
>> From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Peter Zaspel
>> Sent: Friday, November 29, 2013 11:24 AM
>> To: users_at_[hidden]
>> Subject: [OMPI users] Bug in MPI_REDUCE in CUDA-aware MPI
>>
>> Hi users list,
>>
>> I would like to report a bug in the CUDA-aware OpenMPI 1.7.3
>> implementation. I'm using CUDA 5.0 and Ubuntu 12.04.
>>
>> Attached, you will find an example code file, to reproduce the bug.
>> The point is that MPI_Reduce with normal CPU memory fully works but the
>> use of GPU memory leads to a segfault. (GPU memory is used when defining
>> USE_GPU).
>>
>> The segfault looks like this:
>>
>> [peak64g-36:25527] *** Process received signal *** [peak64g-36:25527]
>> Signal: Segmentation fault (11) [peak64g-36:25527] Signal code: Invalid
>> permissions (2) [peak64g-36:25527] Failing at address: 0x600100200 [peak64g-
>> 36:25527] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0)
>> [0x7ff2abdb24a0]
>> [peak64g-36:25527] [ 1]
>> /data/zaspel/openmpi-1.7.3_build/lib/libmpi.so.1(+0x7d410)
>> [0x7ff2ac4b9410] [peak64g-36:25527] [ 2]
>> /data/zaspel/openmpi-
>> 1.7.3_build/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_reduce_intra_
>> basic_linear+0x371)
>> [0x7ff2a5987531]
>> [peak64g-36:25527] [ 3]
>> /data/zaspel/openmpi-1.7.3_build/lib/libmpi.so.1(MPI_Reduce+0x135)
>> [0x7ff2ac499d55]
>> [peak64g-36:25527] [ 4] /home/zaspel/testMPI/test_reduction() [0x400ca0]
>> [peak64g-36:25527] [ 5]
>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7ff2abd9d76d]
>> [peak64g-36:25527] [ 6] /home/zaspel/testMPI/test_reduction() [0x400af9]
>> [peak64g-36:25527] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 25527 on node peak64g-36 exited
>> on signal 11 (Segmentation fault).
>> --------------------------------------------------------------------------
>>
>> Best regards,
>>
>> Peter
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may contain
> confidential information. Any unauthorized review, use, disclosure or distribution
> is prohibited. If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

- --
Dipl.-Inform. Peter Zaspel
Institut fuer Numerische Simulation, Universitaet Bonn
Wegelerstr.6, 53115 Bonn, Germany
tel: +49 228 73-2748 mailto:zaspel_at_[hidden]
fax: +49 228 73-7527 http://wissrech.ins.uni-bonn.de/people/zaspel.html

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSnIsjAAoJEKPU5iaGEeWb8P4P/iJBmdEev/jK0wpTkM0Fi1Dt
BXaJjDKUOaNVxrvQXJPtY1g6AZUWphndi26Y5SP4T7JyvF2isHtjwJq6KiCBJ4KW
KYEga3y8m8o1hocqoW465EkVaibo5zHqXcX7yzVGqkWb/1LwZJh9zjrGBhjPoFzT
JwuEaw7rq1DSn9QeQQPB+CnQsCrKuef5MqDQCfNcBFSoifYks32cdj2l5+Ye/Ltx
vaxPi7VeQuWGcPlvAIE4rdgQVjV3IS+1WcxiMSpUoj2D1IgLDveXWdUlRFjxwEu8
gmRxKMAH4A4WfvpppQYGV9h49kim8EZHfVtHf7c+jRRPDJEDLPdmOltkAlfENL5e
GroMx5PFUqWRpBYoFPh51XqBak9uqai3tD/R2YdBITufRC/UvrfIq0nYyKsnOLUc
0VXejoRJRMuRrJbjHJMtT+EZsln0jaoRuNERbikCwlFvkNevSpcHnC+SNIN73KUY
99g+hwtxdk4oIH4W+YmRlzbKPRBxiTTw9VjufIwo0EcFoI9JfiVbFpXGDTZfUu6x
Z088fu3hCA/q5UoXS1NsDHWUywzkrWsnANSQHXIKXK8jMnounX1kGZ7NH1eA3rrF
IX+EqBybTyrbUQb+XDy3cltBeXFiMxTfN0f4KN8yATol7qeSIpxeeYf5NMT/eBn/
uEWxs9hiQW1IYJ4q3F1S
=Wr/G
-----END PGP SIGNATURE-----