I'm having problems running a simple cuda-aware mpi application; the one found at
I have modified symbol ENV_LOCAL_RANK into OMPI_COMM_WORLD_LOCAL_RANK
My cluster has 2 K20m GPUs per node, with QLogic IB stack.
The normal CUDA/MPI application works fine;
but the cuda-ware mpi app is crashing when using 2 MPI proc over the 2 GPUs of the same node:
the error message is:
Assertion failure at ptl.c:200: nbytes == msglen
I can send the complete backtrace from cuda-gdb if needed.
The same app when running on 2 GPUs on 2 different nodes give another error:
jacobi_cuda_aware_mpi:28280 terminated with signal 11 at PC=2aae9d7c9f78 SP=7fffc06c21f8. Backtrace:
Can someone give me hints where to look to track this problem ?