Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport types
From: Bill Johnstone (beejstone3_at_[hidden])
Date: 2011-07-08 12:59:35


Hello, and thanks for the reply.

----- Original Message -----
> From: Jeff Squyres <jsquyres_at_[hidden]>
> Sent: Thursday, July 7, 2011 5:14 PM
> Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport types
>
> On Jun 28, 2011, at 1:46 PM, Bill Johnstone wrote:
>
>> I have a heterogeneous network of InfiniBand-equipped hosts which are all
> connected to the same backbone switch, an older SDR 10 Gb/s unit.
>>
>> One set of nodes uses the Mellanox "ib_mthca" driver, while the
> other uses the "mlx4" driver.
>>
>> This is on Linux 2.6.32, with Open MPI 1.5.3 .
>>
>> When I run Open MPI across these node types, I get an error message of the
> form:
>>
>> Open MPI detected two different OpenFabrics transport types in the same
> Infiniband network.
>> Such mixed network trasport configuration is not supported by Open MPI.
>>
>> Local host: compute-chassis-1-node-01
>> Local adapter: mthca0 (vendor 0x5ad, part ID 25208)
>> Local transport type: MCA_BTL_OPENIB_TRANSPORT_UNKNOWN
>
> Wow, that's cool ("UNKNOWN").  Are you using an old version of
> OFED or something?

No, clean local build of OFED 1.5.3 packages, but I don't have the full huge complement of OFED packages installed, since our setup is not using IPoIB, SDP, etc.

ibdiagnet, and all the usual suspects work as expected, and I'm able to do large scale Open MPI runs just fine, so long as I don't cross Mellanox HCA types.

> Mellanox -- how can this happen?
>
>> Remote host: compute-chassis-3-node-01
>> Remote Adapter: (vendor 0x2c9, part ID 26428)
>> Remote transport type: MCA_BTL_OPENIB_TRANSPORT_IB
>>
>> Two questions:
>>
>> 1. Why is this occurring if both adapters have all the OpenIB software set
> up?  Is it because Open MPI is trying to use functionality such as ConnectX with
> the newer hardware, which is incompatible with older hardware, or is it
> something more mundane?
>
> It's basically a mismatch of IB capabilities -- Open MPI is trying to use
> more advanced features in some nodes and not in others.

I also tried looking in the adapter-specific settings in the .ini file under /etc, but the only difference I found was in MTU, and I think that's configured on the switch.
 
>> 2. How can I use IB amongst these heterogeneous nodes?
>
> Mellanox will need to answer this question...  It might be able to be done, but
> I don't know how offhand.  The first issue is to figure out why you're
> getting TRANSPORT_UNKNOWN on the one node.

OK, please let me know what other things to try or what other info I can provide.