On Wed, 21 May 2008, Jeff Squyres wrote:
>> I'm only concerned about the case where there's an IB card, the user
>> expects the IB card to be used, and the IB card isn't used.
> Can you put in a site wide
> btl = ^tcp
> to avoid the problem? If the IB card fails, then you'll get
> unreachable MPI errors.
And how many users are going to figure that one out before complaining
loudly? That's what LANL did (probably still does) and it worked great
there, but that doesn't mean that others will figure that out (after all,
not everyone has an OMPI developer on staff...).
>> If the
>> changes don't silence a warning in that situation, I'm fine with
>> you do. But does ibv_get_device_list return an HCA when the port is
>> (because the SM failed and the machine rebooted since that time)?
If this is true (for some reason I thought it wasn't), then I think we'd
actually be ok with your proposal, but you're right, you'd need something
new in the IB btl. I'm not concerned about the dual rail issue -- if
you're smart enough to configure dual rail IB, you're smart enough to
figure out OMPI mca params. I'm not sure the same is true for a simple
delivered from the white box vendor IB setup that barely works on a good
day (and unfortunately, there seems to be evidence that these exist).