Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] if btl->add_procs() fails...?
From: Terry Dontje (Terry.Dontje_at_[hidden])
Date: 2008-08-02 12:46:22


Jeff Squyres wrote:
> On Aug 1, 2008, at 11:39 PM, Brian Barrett wrote:
>
>> My thought is that if add_procs fails, then that BTL should be
>> removed (as if init failed) and things should continue on. If that
>> BTL was the only way to reach another process, we'll catch that later
>> and abort.
>>
>> There are always going to be errors that can't be detected until the
>> device is actually used, so I think that add_procs errors should be
>> treated the same as init errors. The error_cb is a red herring, as
>> that's supposed to be used in situations where an error can't
>> directly be returned to the upper layers (like the progress
>> function). In this case, we can directly return an error, so we
>> should do so (and I believe we do, it's the BML/PML that's the problem).
>
> So if add_procs() fails, do you think that the BML/PML should finalize
> the module? That looks like an easy change to make.
>
> Second, if there are no other successfully-add_proc()'ed modules from
> that component, should the BTL's progress function be removed from the
> list of progress functions? The real question is: if a module
> add_procs() fails, do we mandate that it still must be safe to call
> the component's progress function? I think you're saying "yes", but
> just wanted to be sure. I don't know offhand how a component's
> progress function is added to the list (can't check ATM), so I'd have
> to dig into that a bit.
>
I am curious how all of the above affects client/server or spawned
jobs. If you finalize a BTL then do a connect to a process that would
use that BTL would it reinitialize itself?

--td