On Jan 7, 2009, at 6:28 PM, Biagio Lucini wrote:
> [[5963,1],13][btl_openib_component.c:2893:handle_wc] from node24 to:
> node11 error polling LP CQ with status RECEIVER NOT READY RETRY
> EXCEEDED ERROR status number 13 for wr_id 37779456 opcode 0 qp_idx 0
Ah! If we're dealing a RNR retry exceeded, this is *usually* a
physical layer problem on the IB fabric.
Have you run a complete layer 0 / physical set of diagnostics on the
fabric to know that it is completely working properly?
--
Jeff Squyres
Cisco Systems
|