Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] problem running with RoCE over 10GbE
From: Konz, Jeffrey (SSA Solution Centers) (jeffrey.konz_at_[hidden])
Date: 2011-09-30 18:01:21


Encountered a problem when trying to run OpenMPI 1.5.4 with RoCE over 10GbE fabric.

Got this run time error:

An invalid CPC name was specified via the btl_openib_cpc_include MCA
parameter.

  Local host: atl3-14
  btl_openib_cpc_include value: rdmacm
  Invalid name: rdmacm
  All possible valid names: oob,xoob
--------------------------------------------------------------------------
[atl3-14:07184] mca: base: components_open: component btl / openib open function failed
[atl3-12:09178] mca: base: components_open: component btl / openib open function failed

Used these options to mpirun:
  "--mca btl openib,self,sm --mca btl_openib_cpc_include rdmacm -mca btl_openib_if_include mlx4_0:2"

We have a Mellanox LOM with two ports, first is an IB port, second is an 10GbE port.
Running over the IB port and TCP over the 10GbE port work fine.

Built OpenMPI with this option "--enable-openib-rdmacm".
Our system has OFED 1.5.2 with librdmacm-1.0.13-1

I noticed this output from configure script:
checking rdma/rdma_cma.h usability... no
checking rdma/rdma_cma.h presence... no
checking for rdma/rdma_cma.h... no
checking whether IBV_LINK_LAYER_ETHERNET is declared... yes
checking if RDMAoE support is enabled... yes
checking for infiniband/driver.h... yes
checking if ConnectX XRC support is enabled... yes
checking if dynamic SL is enabled... no
checking if OpenFabrics RDMACM support is enabled... no

Are we missing a build option or a piece of software?
Config.log and output from "ompi_info --all" attached.

% ibv_devinfo
hca_id: mlx4_0
        transport: InfiniBand (0)
        fw_ver: 2.9.1000
        node_guid: 78e7:d103:0021:4464
        sys_image_guid: 78e7:d103:0021:4467
        vendor_id: 0x02c9
        vendor_part_id: 26438
        hw_ver: 0xB0
        board_id: HP_0200000003
        phys_port_cnt: 2
                port: 1
                        state: PORT_ACTIVE (4)
                        max_mtu: 2048 (4)
                        active_mtu: 2048 (4)
                        sm_lid: 34
                        port_lid: 11
                        port_lmc: 0x00
                        link_layer: IB

                port: 2
                        state: PORT_ACTIVE (4)
                        max_mtu: 2048 (4)
                        active_mtu: 1024 (3)
                        sm_lid: 0
                        port_lid: 0
                        port_lmc: 0x00
                        link_layer: Ethernet

% /sbin/ifconfig
eth0 Link encap:Ethernet HWaddr 78:E7:D1:21:44:60
          inet addr:16.113.180.147 Bcast:16.113.183.255 Mask:255.255.252.0
          inet6 addr: fe80::7ae7:d1ff:fe21:4460/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:1861763 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1776402 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:712448939 (679.4 MiB) TX bytes:994111004 (948.0 MiB)
          Memory:fb9e0000-fba00000

eth2 Link encap:Ethernet HWaddr 78:E7:D1:21:44:65
          inet addr:10.10.0.147 Bcast:10.10.0.255 Mask:255.255.255.0
          inet6 addr: fe80::78e7:d100:121:4465/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:8519814 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8555715 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:12370127778 (11.5 GiB) TX bytes:12372246315 (11.5 GiB)

ib0 Link encap:InfiniBand HWaddr 80:00:00:4D:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:192.168.0.147 Bcast:192.168.0.255 Mask:255.255.255.0
          inet6 addr: fe80::7ae7:d103:21:4465/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:16384 Metric:1
          RX packets:1989 errors:0 dropped:0 overruns:0 frame:0
          TX packets:208 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:275196 (268.7 KiB) TX bytes:19202 (18.7 KiB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:42224 errors:0 dropped:0 overruns:0 frame:0
          TX packets:42224 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:3115668 (2.9 MiB) TX bytes:3115668 (2.9 MiB)

Thanks,

-Jeff

/**********************************************************/
/* Jeff Konz jeffrey.konz_at_[hidden] */
/* Solutions Architect HPC Benchmarking */
/* Americas Shared Solutions Architecture (SSA) */
/* Hewlett-Packard Company */
/* Office: 248-491-7480 Mobile: 248-345-6857 */
/**********************************************************/