Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openib RETRY EXCEEDED ERROR
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-03-04 15:00:57


On Mar 1, 2009, at 7:24 PM, Brett Pemberton wrote:

> I'd appreciate some advice on if I'm using OFED correctly.
>
> I'm running OFED 1.4, however not the kernel modules, just userland.
> Is this a bad idea?
>

I believe so. I'm not a kernel guy, but I've always used the userland
bits matched with the corresponding kernel bits. If nothing else,
getting them to match would eliminate one possible source of errors.

> Basically, I recompile the ofed src.rpms for:
>
> dapl, libibcm, libibcommon, libibmad, libibumad, libibverbs, libmthca,
> librdmacm, libsdp, mstflint
>
> And install onto CentOS, upgrading the in-distro versions.
> Should I also be compiling ofa_kernel ?
> Could this be causing problems ?
>

...could be? I don't really know. That would be a better question
for the general_at_[hidden] list.

> As explained off-list, I'm running the most recent firmware for my
> cards, although the release is quite old:
>
> hca_id: mthca0
> fw_ver: 1.2.0
>

I *believe* that's fairly ancient. You might want to check the
support Mellanox web site and see if there's anything more recent for
your HCA.

-- 
Jeff Squyres
Cisco Systems