Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Bug in MPI_REDUCE in CUDA-aware MPI
From: Peter Zaspel (zaspel_at_[hidden])
Date: 2013-11-29 11:23:41


Hi users list,

I would like to report a bug in the CUDA-aware OpenMPI 1.7.3
implementation. I'm using CUDA 5.0 and Ubuntu 12.04.

Attached, you will find an example code file, to reproduce the bug.
The point is that MPI_Reduce with normal CPU memory fully works but the
use of GPU memory leads to a segfault. (GPU memory is used when defining
USE_GPU).

The segfault looks like this:

[peak64g-36:25527] *** Process received signal ***
[peak64g-36:25527] Signal: Segmentation fault (11)
[peak64g-36:25527] Signal code: Invalid permissions (2)
[peak64g-36:25527] Failing at address: 0x600100200
[peak64g-36:25527] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0)
[0x7ff2abdb24a0]
[peak64g-36:25527] [ 1]
/data/zaspel/openmpi-1.7.3_build/lib/libmpi.so.1(+0x7d410) [0x7ff2ac4b9410]
[peak64g-36:25527] [ 2]
/data/zaspel/openmpi-1.7.3_build/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_reduce_intra_basic_linear+0x371)
[0x7ff2a5987531]
[peak64g-36:25527] [ 3]
/data/zaspel/openmpi-1.7.3_build/lib/libmpi.so.1(MPI_Reduce+0x135)
[0x7ff2ac499d55]
[peak64g-36:25527] [ 4] /home/zaspel/testMPI/test_reduction() [0x400ca0]
[peak64g-36:25527] [ 5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7ff2abd9d76d]
[peak64g-36:25527] [ 6] /home/zaspel/testMPI/test_reduction() [0x400af9]
[peak64g-36:25527] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 25527 on node peak64g-36
exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Best regards,

Peter