Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Bug in MPI_REDUCE in CUDA-aware MPI
From: Rolf vandeVaart (rvandevaart_at_[hidden])
Date: 2013-12-02 08:18:51


Thanks for the report. CUDA-aware Open MPI does not currently support doing reduction operations on GPU memory.
Is this a feature you would be interested in?

Rolf

>-----Original Message-----
>From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Peter Zaspel
>Sent: Friday, November 29, 2013 11:24 AM
>To: users_at_[hidden]
>Subject: [OMPI users] Bug in MPI_REDUCE in CUDA-aware MPI
>
>Hi users list,
>
>I would like to report a bug in the CUDA-aware OpenMPI 1.7.3
>implementation. I'm using CUDA 5.0 and Ubuntu 12.04.
>
>Attached, you will find an example code file, to reproduce the bug.
>The point is that MPI_Reduce with normal CPU memory fully works but the
>use of GPU memory leads to a segfault. (GPU memory is used when defining
>USE_GPU).
>
>The segfault looks like this:
>
>[peak64g-36:25527] *** Process received signal *** [peak64g-36:25527]
>Signal: Segmentation fault (11) [peak64g-36:25527] Signal code: Invalid
>permissions (2) [peak64g-36:25527] Failing at address: 0x600100200 [peak64g-
>36:25527] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0)
>[0x7ff2abdb24a0]
>[peak64g-36:25527] [ 1]
>/data/zaspel/openmpi-1.7.3_build/lib/libmpi.so.1(+0x7d410)
>[0x7ff2ac4b9410] [peak64g-36:25527] [ 2]
>/data/zaspel/openmpi-
>1.7.3_build/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_reduce_intra_
>basic_linear+0x371)
>[0x7ff2a5987531]
>[peak64g-36:25527] [ 3]
>/data/zaspel/openmpi-1.7.3_build/lib/libmpi.so.1(MPI_Reduce+0x135)
>[0x7ff2ac499d55]
>[peak64g-36:25527] [ 4] /home/zaspel/testMPI/test_reduction() [0x400ca0]
>[peak64g-36:25527] [ 5]
>/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7ff2abd9d76d]
>[peak64g-36:25527] [ 6] /home/zaspel/testMPI/test_reduction() [0x400af9]
>[peak64g-36:25527] *** End of error message ***
>--------------------------------------------------------------------------
>mpirun noticed that process rank 0 with PID 25527 on node peak64g-36 exited
>on signal 11 (Segmentation fault).
>--------------------------------------------------------------------------
>
>Best regards,
>
>Peter
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------