Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] application with mxm hangs on startup
From: Pavel Mezentsev (pavel.mezentsev_at_[hidden])
Date: 2012-08-22 07:22:44


I've tried to launch the application on nodes with QDR Infiniband. The
first attempt with 2 processes worked, but the following was printed to the
output:
[1345633953.436676] [b01:2523 :0] mpool.c:99 MXM ERROR Invalid
mempool parameter(s)
[1345633953.436676] [b01:2522 :0] mpool.c:99 MXM ERROR Invalid
mempool parameter(s)
--------------------------------------------------------------------------
MXM was unable to create an endpoint. Please make sure that the network
link is
active on the node and the hardware is functioning.

  Error: Invalid parameter

--------------------------------------------------------------------------

The results from this launch didn't differ from the results of the launch
without MXM.

Then I've tried to launch it with 256 processes, but got the same message
from each process and then the application crashed. After that I'm
observing the same behavior as with FDR: application hangs in
the beginning.

Best regards, Pavel Mezentsev.

2012/8/22 Pavel Mezentsev <pavel.mezentsev_at_[hidden]>

> Hello!
>
> I've built openmpi 1.6.1rc3 with support of MXM. But when I try to launch
> an application using this mtl it hangs and can't figure out why.
>
> If I launch it with np below 128 then everything works fine since mxm
> isn't used. I've tried setting the threshold to 0 and launching 2 processes
> with the same result: hangs on startup.
> What could be causing this problem?
>
> Here is the command I execute:
> /opt/openmpi/1.6.1/mxm-test/bin/mpirun \
> -np $NP \
> -hostfile hosts_fdr2 \
> --mca mtl mxm \
> --mca btl ^tcp \
> --mca mtl_mxm_np 0 \
> -x OMP_NUM_THREADS=$NT \
> -x LD_LIBRARY_PATH \
> --bind-to-core \
> -npernode 16 \
> --mca coll_fca_np 0 -mca coll_fca_enable 0 \
> ./IMB-MPI1 -npmin $NP Allreduce Reduce Barrier Bcast
> Allgather Allgatherv
>
> I'm performing the tests on nodes with Intel SB processors and FDR.
> Openmpi was configured with the following parameters:
> CC=icc CXX=icpc F77=ifort FC=ifort ./configure
> --prefix=/opt/openmpi/1.6.1rc3/mxm-test --with-mxm=/opt/mellanox/mxm
> --with-fca=/opt/mellanox/fca --with-knem=/usr/share/knem
> I'm using the latest ofed from mellanox: 1.5.3-3.1.0 on centos 6.1 with
> default kernel: 2.6.32-131.0.15.
> The compilation with default mxm (1.0.601) failed so I installed the
> latest version from mellanox: 1.1.1227
>
> Best regards, Pavel Mezentsev.
>