Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] openib btl - fatal errors don't abort the job
From: Shamis, Pavel (shamisp_at_[hidden])
Date: 2010-09-07 16:32:22


On Sep 3, 2010, at 8:14 AM, Jeff Squyres wrote:

> On Sep 1, 2010, at 4:47 PM, Steve Wise wrote:
>
>> I was wondering what the logic is behind allowing an MPI job to continue in the presence of a fatal qp error?
>
> It's a feature...?

The idea was that in some near future we will be able to recover from such kind of error. (reopen qp, etc...)
But the feature has never been implemented for ompi.
(BTW, not sure that it is tree anymore, since SUN/ORACLE pushed some code, that supposed to handle such cases...)

So, maybe it worth to handle it like device fatal case - abort everything.

Pasha