Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] QLogic HCA random crash after prolonged use
From: Dave Love (d.love_at_[hidden])
Date: 2013-04-25 12:11:30

Ralph Castain <rhc_at_[hidden]> writes:

> On Apr 24, 2013, at 8:58 AM, Dave Love <d.love_at_[hidden]> wrote:
>> "Elken, Tom" <tom.elken_at_[hidden]> writes:
>>>> I have seen it recommended to use psm instead of openib for QLogic cards.
>>> [Tom]
>>> Yes. PSM will perform better and be more stable when running OpenMPI
>>> than using verbs.
>> But unfortunately you won't be able to checkpoint.
> True - yet remember that OMPI no longer supports checkpoint/restart
> after the 1.6 series. Pending a new supporter coming along

As far as I can tell, lack of PSM checkpointing isn't specific to OMPI,
and I know people have resorted to verbs to get it.

Dropped CR is definitely reason not to use OMPI past 1.6. [By the way,
the release notes are confusing, saying that DMTCP is supported, but CR
is dropped.] I'd have hoped a vendor who needs to support CR would
contribute, but I suppose changes just become proprietary if they move
the base past 1.6 :-(.

For general information, what makes the CR support difficult to maintain
-- is it just a question of effort?