Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Heterogeneous OpenFabrics hardware
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-01-27 08:15:39

On Jan 26, 2009, at 4:46 PM, Jeff Squyres wrote:

> Note that I did not say that. I specifically stated that OMPI
> failed and it is due to the fact that we are customizing for the
> individual hardware devices. To be clear: this is an OMPI issue.
> I'm asking (at the request of the IWG) if anyone cares about fixing
> it.

I should clarify something in this discussion: Open MPI is *capable*
of running in heterogeneous OpenFabrics hardware (assuming IB <--> IB
and iWARP <--> iWARP, of course -- not IB <--> iWARP) as long as it is
configured to use the same verbs/hardware configuration on all the
hardware. Depending on the hardware, Open MPI may not be configured
to run this way by default because it may choose to customize
differently for different HCAs/RNICs.

However, if one manually configures Open MPI to use the same verbs/
hardware configuration values across all the HCAs/RNICs in your
cluster, Open MPI will probably work fine. If Open MPI doesn't work
in this kind of configuration, it may indicate some kind of vendor HCA/
RNIC incompatibility.

Case in point: I regression test "limited heterogeneous" scenarios on
my MPI testing cluster at Cisco every night. Specifically, I have a
variety of different models of Mellanox HCAs and they all interoperate
just fine across 2 air-gapped IB subnets. I don't know if anyone has
tested with wildly different HCAs/RNICs using some lowest-common
denominator verbs/hardware configuration values (i.e., some set of
values that is supported by all HCAs/RNICs) to see if OMPI works. I
don't immediately see why that wouldn't work, but I haven't tested it

Out of the box, however, Open MPI is not necessarily configured to
have the same verbs/hardware configuration for each device. That is
what may fail by default.

Jeff Squyres
Cisco Systems