Matt Hughes wrote:
> 2009/2/26 Brett Pemberton <brett_at_[hidden]>:
>> [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org
>> to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status
>> number 12 for wr_id 38996224 opcode 0 qp_idx 0
>
> What OS are you using?
Centos 5
I've seen this error and many other Infiniband
> related errors on RedHat enterprise linux 4 update 4, with ConnectX
> cards and various versions of OFED, up to version 1.3. Depending on
> the MCA parameters, I also see hangs often enough to make native
> Infiniband unusable on this OS.
>
I'd appreciate some advice on if I'm using OFED correctly.
I'm running OFED 1.4, however not the kernel modules, just userland.
Is this a bad idea?
Basically, I recompile the ofed src.rpms for:
dapl, libibcm, libibcommon, libibmad, libibumad, libibverbs, libmthca,
librdmacm, libsdp, mstflint
And install onto CentOS, upgrading the in-distro versions.
Should I also be compiling ofa_kernel ?
Could this be causing problems ?
As explained off-list, I'm running the most recent firmware for my
cards, although the release is quite old:
hca_id: mthca0
fw_ver: 1.2.0
node_guid: 0002:c902:0024:3c6c
sys_image_guid: 0002:c902:0024:3c6f
vendor_id: 0x02c9
vendor_part_id: 25204
hw_ver: 0xA0
board_id: MT_03B0140001
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid: 34
port_lmc: 0x00
cheers,
/ Brett
--
Brett Pemberton - VPAC Senior Systems Administrator
http://www.vpac.org/ - (03) 9925 4899
|