Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] QLogic HCA random crash after prolonged use
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-04-25 12:19:27


On Apr 25, 2013, at 9:11 AM, Dave Love <d.love_at_[hidden]> wrote:

> Ralph Castain <rhc_at_[hidden]> writes:
>
>> On Apr 24, 2013, at 8:58 AM, Dave Love <d.love_at_[hidden]> wrote:
>>
>>> "Elken, Tom" <tom.elken_at_[hidden]> writes:
>>>
>>>>> I have seen it recommended to use psm instead of openib for QLogic cards.
>>>> [Tom]
>>>> Yes. PSM will perform better and be more stable when running OpenMPI
>>>> than using verbs.
>>>
>>> But unfortunately you won't be able to checkpoint.
>>
>> True - yet remember that OMPI no longer supports checkpoint/restart
>> after the 1.6 series. Pending a new supporter coming along
>
> As far as I can tell, lack of PSM checkpointing isn't specific to OMPI,
> and I know people have resorted to verbs to get it.
>
> Dropped CR is definitely reason not to use OMPI past 1.6. [By the way,
> the release notes are confusing, saying that DMTCP is supported, but CR
> is dropped.] I'd have hoped a vendor who needs to support CR would
> contribute, but I suppose changes just become proprietary if they move
> the base past 1.6 :-(.

Not necessarily

>
> For general information, what makes the CR support difficult to maintain
> -- is it just a question of effort?

Largely a lack of interest. Very few (i.e., a handful) of people around the world use it, and it is hard to justify putting in the effort for that small a user group. The person who did the work did so as part of his PhD thesis - he maintained it for a couple of years while doing a post-doc, but now has joined the "real world" and no longer has time. None of the other developers are employed by someone who cares.

>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users