I'm having problem with getting the "error polling LP CQ with status
RNR..." on an otherwise completely empty system.
There are no errors visible in the error counters in any of the HCAs or
switches or anywhere else.
I'm running OMPI 1.3.2 built with pathscale 3.2
If i add -mca btl 'ofud,self,sm' the same code works ok.
It usually only shows up on runs with nodes=16:ppn=8 or higher, i.e. 8x8
This might very well be a pathscale problem since when running with the
debug version of ompi 1.3.2 the problem goes away.
Complete error is:
error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR
status number 13 for wr_id 465284992 opcode -1 vendor error 135 qp_idx
Any ideas to where in the ompi code i should start reducing optimization
levels to pinpoint this?
I'll try some more tests tomorrow with a hopefully fresh mind...
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: ake_at_[hidden] Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se