Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] application with mxm hangs on startup
From: Mike Dubman (mike.ompi_at_[hidden])
Date: 2012-08-24 05:05:54


Hi,
Could you please download latest mxm from
http://www.mellanox.com/products/mxm/ and retry?
The mxm version which comes with OFED 1.5.3 was tested with OMPI 1.6.0.

Regards
M

On Wed, Aug 22, 2012 at 2:22 PM, Pavel Mezentsev
<pavel.mezentsev_at_[hidden]>wrote:

> I've tried to launch the application on nodes with QDR Infiniband. The
> first attempt with 2 processes worked, but the following was printed to the
> output:
> [1345633953.436676] [b01:2523 :0] mpool.c:99 MXM ERROR Invalid
> mempool parameter(s)
> [1345633953.436676] [b01:2522 :0] mpool.c:99 MXM ERROR Invalid
> mempool parameter(s)
> --------------------------------------------------------------------------
> MXM was unable to create an endpoint. Please make sure that the network
> link is
> active on the node and the hardware is functioning.
>
> Error: Invalid parameter
>
> --------------------------------------------------------------------------
>
> The results from this launch didn't differ from the results of the launch
> without MXM.
>
> Then I've tried to launch it with 256 processes, but got the same message
> from each process and then the application crashed. After that I'm
> observing the same behavior as with FDR: application hangs in
> the beginning.
>
> Best regards, Pavel Mezentsev.
>
>
> 2012/8/22 Pavel Mezentsev <pavel.mezentsev_at_[hidden]>
>
>> Hello!
>>
>> I've built openmpi 1.6.1rc3 with support of MXM. But when I try to launch
>> an application using this mtl it hangs and can't figure out why.
>>
>> If I launch it with np below 128 then everything works fine since mxm
>> isn't used. I've tried setting the threshold to 0 and launching 2 processes
>> with the same result: hangs on startup.
>> What could be causing this problem?
>>
>> Here is the command I execute:
>> /opt/openmpi/1.6.1/mxm-test/bin/mpirun \
>> -np $NP \
>> -hostfile hosts_fdr2 \
>> --mca mtl mxm \
>> --mca btl ^tcp \
>> --mca mtl_mxm_np 0 \
>> -x OMP_NUM_THREADS=$NT \
>> -x LD_LIBRARY_PATH \
>> --bind-to-core \
>> -npernode 16 \
>> --mca coll_fca_np 0 -mca coll_fca_enable 0 \
>> ./IMB-MPI1 -npmin $NP Allreduce Reduce Barrier Bcast
>> Allgather Allgatherv
>>
>> I'm performing the tests on nodes with Intel SB processors and FDR.
>> Openmpi was configured with the following parameters:
>> CC=icc CXX=icpc F77=ifort FC=ifort ./configure
>> --prefix=/opt/openmpi/1.6.1rc3/mxm-test --with-mxm=/opt/mellanox/mxm
>> --with-fca=/opt/mellanox/fca --with-knem=/usr/share/knem
>> I'm using the latest ofed from mellanox: 1.5.3-3.1.0 on centos 6.1 with
>> default kernel: 2.6.32-131.0.15.
>> The compilation with default mxm (1.0.601) failed so I installed the
>> latest version from mellanox: 1.1.1227
>>
>> Best regards, Pavel Mezentsev.
>>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>