Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openib RETRY EXCEEDED ERROR
From: Matt Hughes (matt.c.hughes+ompi_at_[hidden])
Date: 2009-02-27 11:54:12


2009/2/26 Brett Pemberton <brett_at_[hidden]>:
> [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org
> to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status
> number 12 for wr_id 38996224 opcode 0 qp_idx 0

What OS are you using? I've seen this error and many other Infiniband
related errors on RedHat enterprise linux 4 update 4, with ConnectX
cards and various versions of OFED, up to version 1.3. Depending on
the MCA parameters, I also see hangs often enough to make native
Infiniband unusable on this OS.

However, the openib btl works just fine on the same hardware and the
same OFED/OpenMPI stack when used with Centos 4.6. I suspect there
may be something about the kernel that is contributing to these
problems, but I haven't had a chance to test the kernel from 4.6 on
4.4.

mch