For the web archives, the user posted a similar question on the
OpenFabrics list and had their question answered by someone from QLogic.
On Jun 26, 2009, at 9:46 PM, Nifty Tom Mitchell wrote:
> On Thu, Jun 25, 2009 at 10:29:39AM -0700, D'Auria, Raffaella wrote:
> > Dear All,
> > I have been encountering a fatal type "error polling LP CQ with
> > RETRY EXCEEDED ERROR status number 12" whenever I try to run a
> > MPI code (see below) that performs an AlltoAll call.
> > We are running the OpenMPI 1.3.2 stack on top of the OFED 1.4.1
> > Our cluster is composed of mostly Mellanox HCAs (MT_03B0140001)
> > some Qlogic (InfiniPath_QLE724) cards.
> > The problem manifests itself as soon as the size of the vector,
> > components are being swapped between processes with the all to
> > call, is equal or larger than 68MB.
> > Please note that I have this problem only when at least one of
> > computational nodes in the host list of mpiexec is a node with
> > qlogic card InfiniPath_QLE724.
> Look at btl flags....
> It is possible that the InfiniPath_QLE7240 fast transport path for
> MPI is not
> connecting to the Mellanox HCA. The default fast path for cards
> like the QLE7240 use the PSM library that Mellanox does not know
> The mpirun man page hints at this but does not divulge what btl is
> and how to expore the modular component archecture (MCA).
> T o m M i t c h e l l
> Found me a new hat, now what?
> users mailing list