Brock Palen <brockp_at_[hidden]> writes:
> We managed to have another user hit the bug that causes collectives (this time MPI_Bcast() ) to hang on IB that was fixed by setting:
>
> btl_openib_cpc_include rdmacm
Could someone explain this? We also have problems with collective hangs
with openib/mlx4 (specifically in IMB), but not with psm, and I couldn't
see any relevant issues filed. However, rdmacm isn't an available value
for that parameter with our 1.4.3 or 1.5.3 installations, only oob (not
that I understand what these things are...).
|