"Kevin M. Hildebrand" <kevin_at_[hidden]> writes:
> Hi, I'm trying to run an OpenMPI 1.6.5 job across a set of nodes, some
> with Mellanox cards and some with Qlogic cards.
Maybe you shouldn't... (I'm blessed in one cluster with three somewhat
incompatible types of QLogic card and a set of Mellanox ones, but
they're in separate islands, apart from the two different SDR ones.)
> I'm getting errors indicating "At least one pair of MPI processes are unable to reach each other for MPI communications". As far as I can tell all of the nodes are properly configured and able to reach each other, via IP and non-IP connections.
> I've also discovered that even if I turn off the IB transport via "--mca btl tcp,self" I'm still getting the same issue.
> The test works fine if I run it confined to hosts with identical IB cards.
> I'd appreciate some assistance in figuring out what I'm doing wrong.
I assume the QLogic cards are using PSM. You'd need to force them to
use openib with something like --mca mtl ^psm and make sure they have
the ipathverbs library available. You probably won't like the resulting
performance -- users here noticed when one set fell back to openib from