Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] QLogic HCA random crash after prolonged use
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-04-24 12:32:10


On Apr 24, 2013, at 8:58 AM, Dave Love <d.love_at_[hidden]> wrote:

> "Elken, Tom" <tom.elken_at_[hidden]> writes:
>
>>> I have seen it recommended to use psm instead of openib for QLogic cards.
>> [Tom]
>> Yes. PSM will perform better and be more stable when running OpenMPI
>> than using verbs.
>
> But unfortunately you won't be able to checkpoint.

True - yet remember that OMPI no longer supports checkpoint/restart after the 1.6 series. Pending a new supporter coming along

>
>> Intel has acquired the InfiniBand assets of QLogic
>> about a year ago. These SDR HCAs are no longer supported, but should
>> still work.
>
> Do you mean they should work with the latest infinipath libraries
> (despite what it said or implied in the notes for last version I got
> from QLogic?) or possibly what's in RHEL? I thought I'd actually tried
> and failed with later stuff, but may just have gone by the release notes.
>
>> You can get the driver (ib_qib) and PSM library from OFED 1.5.4.1 or
>> the current release OFED 3.5.
>
> I wonder if there's a version of the driver that's known to work in a
> current RHEL5 system with QLE7140. We get frequent qib-related kernel
> panics from a vanilla RHEL5.9 kernel -- after running OK under test for
> a few weeks, and nothing relevant appearing to have changed to cause
> it... (There's a trace on the redhat bugzilla with qib in the issue
> title, for what it's worth.) I'm currently reverting to old stuff.
>
> It's good if Infinipath-land is taking an interest in OMPI again, and
> that the libraries are now under a free licence.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users