Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] BTL add procs errors
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-05-28 09:49:59


On May 28, 2010, at 9:32 AM, Jeff Squyres wrote:

>> So please, fix the bug first, then if you want that "automatic failover to
>> TCP" feature, develop it. Put a parameter for an eventual notification, or
>> abort, or whatever you want. But it doesn't exist today. It just doesn't
>> work, with any BTL. Errors reported by BTLs are all fatal.
>
> Understood, and I agreed that the bug should be fixed. Patches would be welcome. :-)

I should clarify rather than being flip:

1. I agree: the bug should be fixed. Clearly, we should never crash.

2. After the bug is fixed, there is clearly a choice: some people may want to use a different transport if a given BTL is unavailable. Others may want to abort. Once the bug is fixed, this seems like a pretty straightforward thing to add.

3. Ralph's point of using the notifier to indicate that an error occurred is a good one -- the notifier should be used to send an alert if IB is borked (for example) regardless of whether the job will simply select another BTL or abort. This is also pretty straightforward to add.

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/