Hello,    

I'm having problems running a simple cuda-aware mpi application; the one found at
https://github.com/parallel-forall/code-samples/tree/master/posts/cuda-aware-mpi-example

I have modified symbol ENV_LOCAL_RANK into OMPI_COMM_WORLD_LOCAL_RANK
My cluster has 2 K20m GPUs per node, with QLogic IB stack.

The normal CUDA/MPI application works fine;
 but the cuda-ware mpi app is crashing when using 2 MPI proc over the 2 GPUs of the same node:
the error message is:
    Assertion failure at ptl.c:200: nbytes == msglen
I can send the complete backtrace from cuda-gdb if needed.

The same app when running on 2 GPUs on 2 different nodes give another error:
    jacobi_cuda_aware_mpi:28280 terminated with signal 11 at PC=2aae9d7c9f78 SP=7fffc06c21f8.      Backtrace:
    /gpfslocal/pub/local/lib64/libinfinipath.so.4(+0x8f78)[0x2aae9d7c9f78]


Can someone give me hints where to look to track this problem ?
Thank you.

Pierre Kestener.