Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bug in MPI_REDUCE in CUDA-aware MPI
From: Rolf vandeVaart (rvandevaart_at_[hidden])
Date: 2013-12-02 08:18:51


Thanks for the report. CUDA-aware Open MPI does not currently support doing reduction operations on GPU memory.
Is this a feature you would be interested in?

Rolf

>-----Original Message-----
>From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Peter Zaspel
>Sent: Friday, November 29, 2013 11:24 AM
>To: users_at_[hidden]
>Subject: [OMPI users] Bug in MPI_REDUCE in CUDA-aware MPI
>
>Hi users list,
>
>I would like to report a bug in the CUDA-aware OpenMPI 1.7.3
>implementation. I'm using CUDA 5.0 and Ubuntu 12.04.
>
>Attached, you will find an example code file, to reproduce the bug.
>The point is that MPI_Reduce with normal CPU memory fully works but the
>use of GPU memory leads to a segfault. (GPU memory is used when defining
>USE_GPU).
>
>The segfault looks like this:
>
>[peak64g-36:25527] *** Process received signal *** [peak64g-36:25527]
>Signal: Segmentation fault (11) [peak64g-36:25527] Signal code: Invalid
>permissions (2) [peak64g-36:25527] Failing at address: 0x600100200 [peak64g-
>36:25527] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0)
>[0x7ff2abdb24a0]
>[peak64g-36:25527] [ 1]
>/data/zaspel/openmpi-1.7.3_build/lib/libmpi.so.1(+0x7d410)
>[0x7ff2ac4b9410] [peak64g-36:25527] [ 2]
>/data/zaspel/openmpi-
>1.7.3_build/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_reduce_intra_
>basic_linear+0x371)
>[0x7ff2a5987531]
>[peak64g-36:25527] [ 3]
>/data/zaspel/openmpi-1.7.3_build/lib/libmpi.so.1(MPI_Reduce+0x135)
>[0x7ff2ac499d55]
>[peak64g-36:25527] [ 4] /home/zaspel/testMPI/test_reduction() [0x400ca0]
>[peak64g-36:25527] [ 5]
>/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7ff2abd9d76d]
>[peak64g-36:25527] [ 6] /home/zaspel/testMPI/test_reduction() [0x400af9]
>[peak64g-36:25527] *** End of error message ***
>--------------------------------------------------------------------------
>mpirun noticed that process rank 0 with PID 25527 on node peak64g-36 exited
>on signal 11 (Segmentation fault).
>--------------------------------------------------------------------------
>
>Best regards,
>
>Peter
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------