I've built openmpi 1.6.1rc3 with support of MXM. But when I try to launch an application using this mtl it hangs and can't figure out why.
If I launch it with np below 128 then everything works fine since mxm isn't used. I've tried setting the threshold to 0 and launching 2 processes with the same result: hangs on startup.
What could be causing this problem?
Here is the command I execute:
-np $NP \
-hostfile hosts_fdr2 \
--mca mtl mxm \
--mca btl ^tcp \
--mca mtl_mxm_np 0 \
-x OMP_NUM_THREADS=$NT \
-x LD_LIBRARY_PATH \
-npernode 16 \
--mca coll_fca_np 0 -mca coll_fca_enable 0 \
./IMB-MPI1 -npmin $NP Allreduce Reduce Barrier Bcast Allgather Allgatherv
I'm performing the tests on nodes with Intel SB processors and FDR. Openmpi was configured with the following parameters:
CC=icc CXX=icpc F77=ifort FC=ifort ./configure --prefix=/opt/openmpi/1.6.1rc3/mxm-test --with-mxm=/opt/mellanox/mxm --with-fca=/opt/mellanox/fca --with-knem=/usr/share/knem
I'm using the latest ofed from mellanox: 1.5.3-3.1.0 on centos 6.1 with default kernel: 2.6.32-131.0.15.
The compilation with default mxm (1.0.601) failed so I installed the latest version from mellanox: 1.1.1227
Best regards, Pavel Mezentsev.