Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] OpenMPI-1.7.3 - cuda support
From: KESTENER Pierre (pierre.kestener_at_[hidden])
Date: 2013-10-30 11:34:14


Hello,

I'm having problems running a simple cuda-aware mpi application; the one found at
https://github.com/parallel-forall/code-samples/tree/master/posts/cuda-aware-mpi-example

I have modified symbol ENV_LOCAL_RANK into OMPI_COMM_WORLD_LOCAL_RANK
My cluster has 2 K20m GPUs per node, with QLogic IB stack.

The normal CUDA/MPI application works fine;
 but the cuda-ware mpi app is crashing when using 2 MPI proc over the 2 GPUs of the same node:
the error message is:
    Assertion failure at ptl.c:200: nbytes == msglen
I can send the complete backtrace from cuda-gdb if needed.

The same app when running on 2 GPUs on 2 different nodes give another error:
    jacobi_cuda_aware_mpi:28280 terminated with signal 11 at PC=2aae9d7c9f78 SP=7fffc06c21f8. Backtrace:
    /gpfslocal/pub/local/lib64/libinfinipath.so.4(+0x8f78)[0x2aae9d7c9f78]

Can someone give me hints where to look to track this problem ?
Thank you.

Pierre Kestener.