Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] btl_openib_cpc_include rdmacm questions
From: Brock Palen (brockp_at_[hidden])
Date: 2011-05-05 16:15:45


Yeah we have ran into more issues, with rdmacm not being avialable on all of our hosts. So it would be nice to know what we can do to test that a host would support rdmacm,

Example:

--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host: nyx5067.engin.umich.edu
  Local device: mlx4_0
  Local port: 1
  CPCs attempted: rdmacm
--------------------------------------------------------------------------

This is one of our QDR hosts that rdmacm generally works on. Which this code (CRASH) requires to avoid a collective hang in MPI_Allreduce()

I look on this hosts and I find:
[root_at_nyx5067 ~]# rpm -qa | grep rdma
librdmacm-1.0.11-1
librdmacm-1.0.11-1
librdmacm-devel-1.0.11-1
librdmacm-devel-1.0.11-1
librdmacm-utils-1.0.11-1

So all the libraries are installed (I think) is there a way to verify this? Or to have OpenMPI be more verbose what caused rdmacm to fail as an oob option?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp_at_[hidden]
(734)936-1985

On May 3, 2011, at 9:42 AM, Dave Love wrote:

> Brock Palen <brockp_at_[hidden]> writes:
>
>> We managed to have another user hit the bug that causes collectives (this time MPI_Bcast() ) to hang on IB that was fixed by setting:
>>
>> btl_openib_cpc_include rdmacm
>
> Could someone explain this? We also have problems with collective hangs
> with openib/mlx4 (specifically in IMB), but not with psm, and I couldn't
> see any relevant issues filed. However, rdmacm isn't an available value
> for that parameter with our 1.4.3 or 1.5.3 installations, only oob (not
> that I understand what these things are...).
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>