Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport types
From: Yevgeny Kliteynik (kliteyn_at_[hidden])
Date: 2011-07-10 02:48:32


Hi Bill,

On 08-Jul-11 7:59 PM, Bill Johnstone wrote:
> Hello, and thanks for the reply.
>
>
>
> ----- Original Message -----
>> From: Jeff Squyres<jsquyres_at_[hidden]>
>> Sent: Thursday, July 7, 2011 5:14 PM
>> Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport types
>>
>> On Jun 28, 2011, at 1:46 PM, Bill Johnstone wrote:
>>
>>> I have a heterogeneous network of InfiniBand-equipped hosts which are all
>> connected to the same backbone switch, an older SDR 10 Gb/s unit.
>>>
>>> One set of nodes uses the Mellanox "ib_mthca" driver, while the
>> other uses the "mlx4" driver.
>>>
>>> This is on Linux 2.6.32, with Open MPI 1.5.3 .
>>>
>>> When I run Open MPI across these node types, I get an error message of the
>> form:
>>>
>>> Open MPI detected two different OpenFabrics transport types in the same
>> Infiniband network.
>>> Such mixed network trasport configuration is not supported by Open MPI.
>>>
>>> Local host: compute-chassis-1-node-01
>>> Local adapter: mthca0 (vendor 0x5ad, part ID 25208)
>>> Local transport type: MCA_BTL_OPENIB_TRANSPORT_UNKNOWN
>>
>> Wow, that's cool ("UNKNOWN"). Are you using an old version of
>> OFED or something?
>
> No, clean local build of OFED 1.5.3 packages, but I don't have the full huge complement of OFED packages installed, since our setup is not using IPoIB, SDP, etc.
>
> ibdiagnet, and all the usual suspects work as expected, and I'm able to do large scale Open MPI runs just fine, so long as I don't cross Mellanox HCA types.
>
>
>> Mellanox -- how can this happen?
>>
>>> Remote host: compute-chassis-3-node-01
>>> Remote Adapter: (vendor 0x2c9, part ID 26428)
>>> Remote transport type: MCA_BTL_OPENIB_TRANSPORT_IB
>>>
>>> Two questions:
>>>
>>> 1. Why is this occurring if both adapters have all the OpenIB software set
>> up? Is it because Open MPI is trying to use functionality such as ConnectX with
>> the newer hardware, which is incompatible with older hardware, or is it
>> something more mundane?
>>
>> It's basically a mismatch of IB capabilities -- Open MPI is trying to use
>> more advanced features in some nodes and not in others.
>
> I also tried looking in the adapter-specific settings in the .ini file under /etc, but the only difference I found was in MTU, and I think that's configured on the switch.
>
>>> 2. How can I use IB amongst these heterogeneous nodes?
>>
>> Mellanox will need to answer this question... It might be able to be done, but
>> I don't know how offhand. The first issue is to figure out why you're
>> getting TRANSPORT_UNKNOWN on the one node.
>
> OK, please let me know what other things to try or what other info I can provide.

I'll check the MCA_BTL_OPENIB_TRANSPORT_UNKNOWN thing and get back to you.
One question though, just to make sure we're on the same page: so the jobs do run OK on
the older HCAs, as long as they run *only* on the older HCAs, right?
Please make sure that the jobs are using only IB with "--mca btl openib,self" parameters.

-- YK
 
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>