Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openib RETRY EXCEEDED ERROR
From: Bogdan Costescu (Bogdan.Costescu_at_[hidden])
Date: 2009-02-27 05:34:10


Brett Pemberton <brett_at_[hidden]> wrote:

> [[1176,1],0][btl_openib_component.c:2905:handle_wc] from
> tango092.vpac.org to: tango090 error polling LP CQ with status RETRY
> EXCEEDED ERROR status number 12 for wr_id 38996224 opcode 0 qp_idx 0

I've seen this error with Mellanox ConnectX cards and OFED 1.2.x with
all versions of OpenMPI that I have tried (1.2.x and pre-1.3) and some
MVAPICH versions, from which I have concluded that the problem lies in
the lower levels (OFED or IB card firmware). Indeed after the
installation of OFED 1.3.x and a possible firmware update (not sure
about the firmware as I don't admin that cluster), these errors have
disappeared.

-- 
Bogdan Costescu
IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.costescu_at_[hidden]