Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] OpenIB Error in ibv_create_srq
From: Allen Barnett (allen_at_[hidden])
Date: 2010-07-30 12:21:25


Hi: A customer is attempting to run our OpenMPI 1.4.2-based application
on a cluster of machines running RHEL4 with the standard OFED stack. The
HCAs are identified as:

03:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1)
04:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1)

ibv_devinfo says that one port on the HCAs is active but the other is
down:

hca_id: mthca0
        fw_ver: 3.0.2
        node_guid: 0006:6a00:9800:4c78
        sys_image_guid: 0006:6a00:9800:4c78
        vendor_id: 0x066a
        vendor_part_id: 23108
        hw_ver: 0xA1
        phys_port_cnt: 2
                port: 1
                        state: active (4)
                        max_mtu: 2048 (4)
                        active_mtu: 2048 (4)
                        sm_lid: 1
                        port_lid: 26
                        port_lmc: 0x00

                port: 2
                        state: down (1)
                        max_mtu: 2048 (4)
                        active_mtu: 512 (2)
                        sm_lid: 0
                        port_lid: 0
                        port_lmc: 0x00

 When the OMPI application is run, it prints the error message:

--------------------------------------------------------------------
The OpenFabrics (openib) BTL failed to initialize while trying to
create an internal queue. This typically indicates a failed
OpenFabrics installation, faulty hardware, or that Open MPI is
attempting to use a feature that is not supported on your hardware
(i.e., is a shared receive queue specified in the
btl_openib_receive_queues MCA parameter with a device that does not
support it?). The failure occured here:

  Local host: machine001.lan
  OMPI
source: /software/openmpi-1.4.2/ompi/mca/btl/openib/btl_openib.c:250
  Function: ibv_create_srq()
  Error: Invalid argument (errno=22)
  Device: mthca0

You may need to consult with your system administrator to get this
problem fixed.
--------------------------------------------------------------------

The full log of a run with "btl_openib_verbose 1" is attached. My
application appears to run to completion, but I can't tell if it's just
running on TCP and not using the IB hardware.

I would appreciate any suggestions on how to proceed to fix this error.

Thanks,
Allen

-- 
Allen Barnett
Transpire, Inc
E-Mail: allen_at_[hidden]