Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problems with "error polling LP CQ with status RNR"
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-05-14 09:24:08


On May 13, 2009, at 4:55 PM, Åke Sandgren wrote:

> I'm having problem with getting the "error polling LP CQ with status
> RNR..." on an otherwise completely empty system.
> There are no errors visible in the error counters in any of the HCAs
> or
> switches or anywhere else.
>
> I'm running OMPI 1.3.2 built with pathscale 3.2
>
> If i add -mca btl 'ofud,self,sm' the same code works ok.
>

Interesting. I have only done very limited testing with ofud; are you
saying that you get these errors if you "--mca btl openib,sm,self"?

> It usually only shows up on runs with nodes=16:ppn=8 or higher, i.e.
> 8x8
> works ok.
>
> This might very well be a pathscale problem since when running with
> the
> debug version of ompi 1.3.2 the problem goes away.
>
> Complete error is:
> error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED
> ERROR
> status number 13 for wr_id 465284992 opcode -1 vendor error 135
> qp_idx
> 0
>
> Any ideas to where in the ompi code i should start reducing
> optimization
> levels to pinpoint this?
>

Do you have a simple reproducer test case, perchance?

-- 
Jeff Squyres
Cisco Systems