Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenIB Error in ibv_create_srq
From: Terry Dontje (terry.dontje_at_[hidden])
Date: 2010-08-02 06:47:24


My guess is from the message below saying "(openib) BTL failed to
initialize" that the code is probably running over tcp. To absolutely
prove this you can specify to only use the openib, sm and self btls to
eliminate the tcp btl. To do that you add the following to the mpirun
line "-mca btl openib,sm,self". I believe with that specification the
code will abort and not run to completion.

What version of the OFED stack are you using? I wonder if srq is
supported on your system or not?

--td

Allen Barnett wrote:
> Hi: A customer is attempting to run our OpenMPI 1.4.2-based application
> on a cluster of machines running RHEL4 with the standard OFED stack. The
> HCAs are identified as:
>
> 03:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1)
> 04:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1)
>
> ibv_devinfo says that one port on the HCAs is active but the other is
> down:
>
> hca_id: mthca0
> fw_ver: 3.0.2
> node_guid: 0006:6a00:9800:4c78
> sys_image_guid: 0006:6a00:9800:4c78
> vendor_id: 0x066a
> vendor_part_id: 23108
> hw_ver: 0xA1
> phys_port_cnt: 2
> port: 1
> state: active (4)
> max_mtu: 2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 1
> port_lid: 26
> port_lmc: 0x00
>
> port: 2
> state: down (1)
> max_mtu: 2048 (4)
> active_mtu: 512 (2)
> sm_lid: 0
> port_lid: 0
> port_lmc: 0x00
>
>
> When the OMPI application is run, it prints the error message:
>
> --------------------------------------------------------------------
> The OpenFabrics (openib) BTL failed to initialize while trying to
> create an internal queue. This typically indicates a failed
> OpenFabrics installation, faulty hardware, or that Open MPI is
> attempting to use a feature that is not supported on your hardware
> (i.e., is a shared receive queue specified in the
> btl_openib_receive_queues MCA parameter with a device that does not
> support it?). The failure occured here:
>
> Local host: machine001.lan
> OMPI
> source: /software/openmpi-1.4.2/ompi/mca/btl/openib/btl_openib.c:250
> Function: ibv_create_srq()
> Error: Invalid argument (errno=22)
> Device: mthca0
>
> You may need to consult with your system administrator to get this
> problem fixed.
> --------------------------------------------------------------------
>
> The full log of a run with "btl_openib_verbose 1" is attached. My
> application appears to run to completion, but I can't tell if it's just
> running on TCP and not using the IB hardware.
>
> I would appreciate any suggestions on how to proceed to fix this error.
>
> Thanks,
> Allen
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>



picture