Could anyone tell me the important tuning parameters in openmpi with IB interconnect? I tried setting eager_rdma, min_rdma_size, mpi_leave_pinned parameters from the mpirun command line on 38 nodes cluster (38*2 processors) but in vain. I found simple mpirun with no mca parameters performing better. I conducted test on P2P send/receive with data size of 8MB.
Similarly i patched HPL linpack code with libnbc(non blocking collectives) and found no performance benefits. I went through its patch and found that, its probably not overlapping computation with communication.
Any help in this direction would be appreciated.