RNR , receive is not ready - It means that on recv side MPI don't have
buffers to get the data.
It may point to some broken configuration in MPI/ofud or credit leak in
Åke Sandgren wrote:
> I'm having problem with getting the "error polling LP CQ with status
> RNR..." on an otherwise completely empty system.
> There are no errors visible in the error counters in any of the HCAs or
> switches or anywhere else.
> I'm running OMPI 1.3.2 built with pathscale 3.2
> If i add -mca btl 'ofud,self,sm' the same code works ok.
> It usually only shows up on runs with nodes=16:ppn=8 or higher, i.e. 8x8
> works ok.
> This might very well be a pathscale problem since when running with the
> debug version of ompi 1.3.2 the problem goes away.
> Complete error is:
> error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR
> status number 13 for wr_id 465284992 opcode -1 vendor error 135 qp_idx
> Any ideas to where in the ompi code i should start reducing optimization
> levels to pinpoint this?
> I'll try some more tests tomorrow with a hopefully fresh mind...