Matt Hughes wrote:
> 2009/2/26 Brett Pemberton <brett_at_[hidden]>:
>> [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org
>> to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status
>> number 12 for wr_id 38996224 opcode 0 qp_idx 0
> What OS are you using?
I've seen this error and many other Infiniband
> related errors on RedHat enterprise linux 4 update 4, with ConnectX
> cards and various versions of OFED, up to version 1.3. Depending on
> the MCA parameters, I also see hangs often enough to make native
> Infiniband unusable on this OS.
I'd appreciate some advice on if I'm using OFED correctly.
I'm running OFED 1.4, however not the kernel modules, just userland.
Is this a bad idea?
Basically, I recompile the ofed src.rpms for:
dapl, libibcm, libibcommon, libibmad, libibumad, libibverbs, libmthca,
librdmacm, libsdp, mstflint
And install onto CentOS, upgrading the in-distro versions.
Should I also be compiling ofa_kernel ?
Could this be causing problems ?
As explained off-list, I'm running the most recent firmware for my
cards, although the release is quite old:
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
Brett Pemberton - VPAC Senior Systems Administrator
http://www.vpac.org/ - (03) 9925 4899