Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OPENIB unknown transport errors
From: Tim Miller (btamiller_at_[hidden])
Date: 2014-06-04 12:47:27


Hi,

I'd like to revive this thread, since I am still periodically getting
errors of this type. I have built 1.8.1 with --enable-debug and run with
-mca btl_openib_verbose 10. Unfortunately, this doesn't seem to provide any
additional information that I can find useful. I've gone ahead and attached
a dump of the output under 1.8.1. The key lines are:

--------------------------------------------------------------------------
Open MPI detected two different OpenFabrics transport types in the same
Infiniband network.
Such mixed network trasport configuration is not supported by Open MPI.

  Local host: w1
  Local adapter: mlx4_0 (vendor 0x2c9, part ID 26428)
  Local transport type: MCA_BTL_OPENIB_TRANSPORT_IB

  Remote host: w16
  Remote Adapter: (vendor 0x2c9, part ID 26428)
  Remote transport type: MCA_BTL_OPENIB_TRANSPORT_UNKNOWN
-------------------------------------------------------------------------

Note that the vendor and part IDs are the same. If I immediately run on the
same two nodes using MVAPICH2, everything is fine.

I'm really very befuddled by this. OpenMPI sees that the two cards are the
same and made by the same vendor, yet it thinks the transport types are
different (and one is unknown). I'm hoping someone with some experience
with how the OpenIB BTL works can shed some light on this problem...

Tim

On Fri, May 9, 2014 at 7:39 PM, Joshua Ladd <jladd.mlnx_at_[hidden]> wrote:

>
> Just wondering if you've tried with the latest stable OMPI, 1.8.1? I'm
> wondering if this is an issue with the OOB. If you have a debug build, you
> can run -mca btl_openib_verbose 10
>
> Josh
>
>
> On Fri, May 9, 2014 at 6:26 PM, Joshua Ladd <jladd.mlnx_at_[hidden]> wrote:
>
>> Hi, Tim
>>
>> Run "ibstat" on each host:
>>
>> 1. Make sure the adapters are alive and active.
>>
>> 2. Look at the Link Layer settings for host w34. Does it match host w4's?
>>
>>
>> Josh
>>
>>
>> On Fri, May 9, 2014 at 1:18 PM, Tim Miller <btamiller_at_[hidden]> wrote:
>>
>>> Hi All,
>>>
>>> We're using OpenMPI 1.7.3 with Mellanox ConnectX InfiniBand adapters,
>>> and periodically our jobs abort at start-up with the following error:
>>>
>>> ===
>>> Open MPI detected two different OpenFabrics transport types in the same
>>> Infiniband network.
>>> Such mixed network trasport configuration is not supported by Open MPI.
>>>
>>> Local host: w4
>>> Local adapter: mlx4_0 (vendor 0x2c9, part ID 26428)
>>> Local transport type: MCA_BTL_OPENIB_TRANSPORT_IB
>>>
>>> Remote host: w34
>>> Remote Adapter: (vendor 0x2c9, part ID 26428)
>>> Remote transport type: MCA_BTL_OPENIB_TRANSPORT_UNKNOWN
>>> ===
>>>
>>> I've done a bit of googling and not found very much. We do not see this
>>> issue when we run with MVAPICH2 on the same sets of nodes.
>>>
>>> Any advice or thoughts would be very welcome, as I am stumped by what
>>> causes this. The nodes are all running Scientific Linux 6 with Mellanox
>>> drivers installed via the SL-provided RPMs.
>>>
>>> Tim
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>