Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Mixed Mellanox and Qlogic problems
From: David Warren (warren_at_[hidden])
Date: 2011-06-30 17:03:43


I have a cluster with mostly Mellanox ConnectX hardware and a few with
Qlogic QLE7340's. After looking through the web, FAQs etc. I built
openmpi-1.5.3 with psm and openib. If I run within the same hardware it
is fast and works fine. If I run between without specifying an MTL (e.g.
mpirun -np 24 -machinefile dwhosts --byslot --bind-to-core --mca btl
^tcp ...) it dies with
*** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> [n16:9438] Abort before MPI_INIT completed successfully; not able to
guarantee that all other processes were killed!
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
...
I can make it run but giving a bad mtl e.g. -mca mtl psm,none. All the
processes run after complaining that mtl none does not exist. However,
they run just as slow (about 10% slower than either set alone)

Pertinent info:
On the Qlogic Nodes:
OFED: QLogic-OFED.SLES11-x86_64.1.5.3.0.22
On the Mellanox Nodes:
OFED-1.5.2.1-20101105-0600

All:
debian lenny kernel 2.6.32.41
OpenSM
limit | grep memorylocked gives unlimited on all nodes.

Configure line:
./configure --with-libnuma --with-openib
--prefix=/usr/local/openmpi-1.5.3 --with-psm=/usr
--enable-btl-openib-failover --enable-openib-connectx-xrc
--enable-openib-rdmacm

I thought that with 1.5.3 I am supposed to be able to do this. Am I just
wrong? Does anyone see what I am doing wrong?

Thanks