Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] RDMACM Differences
From: Michael Shuey (shuey_at_[hidden])
Date: 2011-03-03 07:39:27


Alternatively, if OpenMPI is really trying to use both ports, you
could force it to use just one port with --mca btl_openib_if_include
mlx4_0:1 (probably)

--
Mike Shuey
On Tue, Mar 1, 2011 at 1:02 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> On Feb 28, 2011, at 12:49 PM, Jagga Soorma wrote:
>
>> -bash-3.2$ mpiexec --mca btl openib,self -mca btl_openib_warn_default_gid_
>> prefix 0 -np 2 --hostfile mpihosts /home/jagga/osu-micro-benchmarks-3.3/openmpi/ofed-1.5.2/bin/osu_latency
>
> Your use of btl_openib_warn_default_gid_prefix may have brought up a subtle issue in Open MPI's verbs support.  More below.
>
>> # OSU MPI Latency Test v3.3
>> # Size            Latency (us)
>> [amber04][[10252,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all] error modifing QP to RTR errno says Invalid argument
>> [amber04][[10252,1],1][connect/btl_openib_connect_oob.c:815:rml_recv_cb] error in endpoint reply start connect
>
> Looking at this error message and your ibv_devinfo output:
>
>> [root_at_amber03 ~]# ibv_devinfo
>> hca_id:    mlx4_0
>>     transport:            InfiniBand (0)
>>     fw_ver:                2.7.9294
>>     node_guid:            78e7:d103:0021:8884
>>     sys_image_guid:            78e7:d103:0021:8887
>>     vendor_id:            0x02c9
>>     vendor_part_id:            26438
>>     hw_ver:                0xB0
>>     board_id:            HP_0200000003
>>     phys_port_cnt:            2
>>         port:    1
>>             state:            PORT_ACTIVE (4)
>>             max_mtu:        2048 (4)
>>             active_mtu:        2048 (4)
>>             sm_lid:            1
>>             port_lid:        20
>>             port_lmc:        0x00
>>             link_layer:        IB
>>
>>         port:    2
>>             state:            PORT_ACTIVE (4)
>>             max_mtu:        2048 (4)
>>             active_mtu:        1024 (3)
>>             sm_lid:            0
>>             port_lid:        0
>>             port_lmc:        0x00
>>             link_layer:        Ethernet
>
> It looks like you have 1 HCA port as IB and the other at Ethernet.
>
> I'm wondering if OMPI is not taking the device transport into account and is *only* using the subnet ID to determine reachability (i.e., I'm wondering if we didn't anticipate multiple devices/ports with the same subnet ID but with different transports).  I pointed this out to Mellanox yesterday; I think they're following up on it.
>
> In the meantime, a workaround might be to set a non-default subnet ID on your IB network.  That should allow Open MPI to tell these networks apart without additional help.
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>