Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] BTL add procs errors
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-05-27 15:19:25


On May 27, 2010, at 10:32 AM, Sylvain Jeaugey wrote:

> That's pretty much my first proposition : abort when an error arises,
> because if we don't, we'll crash soon afterwards. That's my original
> concern and this should really be fixed.
>
> Now, if you want to fix the openib BTL so that an error in IB results in
> an elegant fallback on TCP (elegant = notified ;-)), then hooray.

You're specifically referring to the point where the openib btl sets the reachable bit, but then later decides "oops, an error occurred, so return !=OMPI_SUCCESS" -- and assume that the openib btl is not called again.

Right?

If so, then yes, that's a bug. The openib btl should be fixed to unset the reachable bit(s) that it just set before returning the error.

Or the BML could assume that !=OMPI_SUCCESS codes means that the reachable bits it got back were invalid and should be ignored.

I'd lead towards the former.

Can you file and bug and submit a patch?

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/