Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bug in MPI_REDUCE in CUDA-aware MPI
From: Rolf vandeVaart (rvandevaart_at_[hidden])
Date: 2013-12-02 08:48:40


Hi Peter:
The reason behind not having the reduction support (I believe) was just the complexity of adding it to the code. I will at least submit a ticket so we can look at it again.

Here is a link to FAQ which lists the APIs which are CUDA-aware.
http://www.open-mpi.org/faq/?category=running#mpi-cuda-support

Regards,
Rolf

>-----Original Message-----
>From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Peter Zaspel
>Sent: Monday, December 02, 2013 8:29 AM
>To: Open MPI Users
>Subject: Re: [OMPI users] Bug in MPI_REDUCE in CUDA-aware MPI
>
>* PGP Signed by an unknown key
>
>Hi Rolf,
>
>OK, I didn't know that. Sorry.
>
>Yes, it would be a pretty important feature in cases when you are doing
>reduction operations on many, many entries in parallel. Therefore, each
>reduction is not very complex or time-consuming but potentially hundreds of
>thousands reductions are done at the same time. This is definitely a point
>where a CUDA-aware implementation can give some performance
>improvements.
>
>I'm curious: Rather complex operations like allgatherv are CUDA-aware, but a
>reduction is not. Is there a reasoning for this? Is there some documentation,
>which MPI calls are CUDA-aware and which not?
>
>Best regards
>
>Peter
>
>
>
>On 12/02/2013 02:18 PM, Rolf vandeVaart wrote:
>> Thanks for the report. CUDA-aware Open MPI does not currently support
>doing reduction operations on GPU memory.
>> Is this a feature you would be interested in?
>>
>> Rolf
>>
>>> -----Original Message-----
>>> From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Peter
>>> Zaspel
>>> Sent: Friday, November 29, 2013 11:24 AM
>>> To: users_at_[hidden]
>>> Subject: [OMPI users] Bug in MPI_REDUCE in CUDA-aware MPI
>>>
>>> Hi users list,
>>>
>>> I would like to report a bug in the CUDA-aware OpenMPI 1.7.3
>>> implementation. I'm using CUDA 5.0 and Ubuntu 12.04.
>>>
>>> Attached, you will find an example code file, to reproduce the bug.
>>> The point is that MPI_Reduce with normal CPU memory fully works but
>>> the use of GPU memory leads to a segfault. (GPU memory is used when
>>> defining USE_GPU).
>>>
>>> The segfault looks like this:
>>>
>>> [peak64g-36:25527] *** Process received signal *** [peak64g-36:25527]
>>> Signal: Segmentation fault (11) [peak64g-36:25527] Signal code:
>>> Invalid permissions (2) [peak64g-36:25527] Failing at address:
>>> 0x600100200 [peak64g- 36:25527] [ 0]
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x364a0)
>>> [0x7ff2abdb24a0]
>>> [peak64g-36:25527] [ 1]
>>> /data/zaspel/openmpi-1.7.3_build/lib/libmpi.so.1(+0x7d410)
>>> [0x7ff2ac4b9410] [peak64g-36:25527] [ 2]
>>> /data/zaspel/openmpi-
>>>
>1.7.3_build/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_reduce_intr
>>> a_
>>> basic_linear+0x371)
>>> [0x7ff2a5987531]
>>> [peak64g-36:25527] [ 3]
>>> /data/zaspel/openmpi-1.7.3_build/lib/libmpi.so.1(MPI_Reduce+0x135)
>>> [0x7ff2ac499d55]
>>> [peak64g-36:25527] [ 4] /home/zaspel/testMPI/test_reduction()
>>> [0x400ca0] [peak64g-36:25527] [ 5]
>>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)
>>> [0x7ff2abd9d76d] [peak64g-36:25527] [ 6]
>>> /home/zaspel/testMPI/test_reduction() [0x400af9] [peak64g-36:25527]
>>> *** End of error message ***
>>> ---------------------------------------------------------------------
>>> ----- mpirun noticed that process rank 0 with PID 25527 on node
>>> peak64g-36 exited on signal 11 (Segmentation fault).
>>> ---------------------------------------------------------------------
>>> -----
>>>
>>> Best regards,
>>>
>>> Peter
>> ----------------------------------------------------------------------
>> ------------- This email message is for the sole use of the intended
>> recipient(s) and may contain confidential information. Any
>> unauthorized review, use, disclosure or distribution is prohibited.
>> If you are not the intended recipient, please contact the sender by
>> reply email and destroy all copies of the original message.
>> ----------------------------------------------------------------------
>> ------------- _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>--
>Dipl.-Inform. Peter Zaspel
>Institut fuer Numerische Simulation, Universitaet Bonn Wegelerstr.6, 53115
>Bonn, Germany
>tel: +49 228 73-2748 mailto:zaspel_at_[hidden]
>fax: +49 228 73-7527 http://wissrech.ins.uni-bonn.de/people/zaspel.html
>
>* Unknown Key
>* 0x8611E59B(L)
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users