Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] RFC: Linuxes shipping libibverbs
From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2008-05-21 17:02:53


On Wed, 21 May 2008, Jeff Squyres wrote:

>> I'm only concerned about the case where there's an IB card, the user
>> expects the IB card to be used, and the IB card isn't used.
>
> Can you put in a site wide
>
> btl = ^tcp
>
> to avoid the problem? If the IB card fails, then you'll get
> unreachable MPI errors.

And how many users are going to figure that one out before complaining
loudly? That's what LANL did (probably still does) and it worked great
there, but that doesn't mean that others will figure that out (after all,
not everyone has an OMPI developer on staff...).

>> If the
>> changes don't silence a warning in that situation, I'm fine with
>> whatever
>> you do. But does ibv_get_device_list return an HCA when the port is
>> down
>> (because the SM failed and the machine rebooted since that time)?
>
> Yes.

If this is true (for some reason I thought it wasn't), then I think we'd
actually be ok with your proposal, but you're right, you'd need something
new in the IB btl. I'm not concerned about the dual rail issue -- if
you're smart enough to configure dual rail IB, you're smart enough to
figure out OMPI mca params. I'm not sure the same is true for a simple
delivered from the white box vendor IB setup that barely works on a good
day (and unfortunately, there seems to be evidence that these exist).

Brian