Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Linuxes shipping libibverbs
From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2008-05-21 15:47:06


Then we disagree on a core point. I believe that users should never have
something silently unexpected happen (like falling back to TCP from a high
speed interconnect because of a NIC reset / software issue). YOu clearly
don't feel this way. I don't really work on the project, but do have lots
of experience being yelled at by users when something unexpected happens.

I guarantee you we'll see a report of poor IB / application performance
because of the silent fallback to TCP. There's a reason that error
message was put in. I don't get a vote anymore, so do whatever you think
is best.

Brian

On Wed, 21 May 2008, Jeff Squyres wrote:

> One thing I should clarify -- the ibverbs error message from my
> previous mail is a red herring. libibverbs prints that message on
> systems where the kernel portions of the OFED stack are not installed
> (such as the quick-n-dirty test that I did before -- all I did was
> install libibverbs without the corresponding kernel stuff). I
> installed the whole OFED stack on a machine with no verbs-capable
> hardware and verified that the libibverbs message does *not* appear
> when the kernel bits are properly installed and running.
>
> So we're only talking about the Open MPI warning message here. More
> below.
>
>
>
> On May 21, 2008, at 12:17 PM, Brian W. Barrett wrote:
>
>>> 2. An out-of-the-box "mpirun a.out" will print warning messages in
>>> perfectly valid/good configurations (no verbs-capable hardware, but
>>> just happen to have libibverbs installed). This is a Big Deal.
>>
>> Which is easily solved with a better error message, as Pasha
>> suggested.
>
> I guess this is where we disagree: I don't believe that the issue is
> solved by making a "better" message. Specifically: this is the first
> case where we're saying "if you run with a valid configuration, you're
> going to get a warning message and you have to do something extra to
> turn it off."
>
> That just seems darn weird to me, especially when other MPI's don't do
> the same thing. Come to think of it, I can't think of many other
> software packages that do that.
>
>>> In short: I think it's no longer safe to assume that machines with
>>> libibverbs installed must also have verbs-capable hardware.
>>
>> But here's the real problem -- with our current selection logic, a
>> user
>> with libibverbs but no IB cards gets an error message saying "hey,
>> we need
>> you to set this flag to make this error go away" (or would, per
>> Pasha's
>> suggestion). A user with a busted IB stack on a node (which we
>> still saw
>> pretty often at LANL) starts using TCP and their application runs
>> like a
>> dog.
>>
>> I guess it's a matter of how often you see errors in the IB stack that
>> cause nic initialization to fail. The machines I tend to use still
>> exhibit this problem pretty often, but it's possible I just work on
>> bad
>> hardware more often than is usual in the wild.
>
> I guess this is the central issue: what *is* the common case? Which
> set of users should be forced to do something different?
>
> I'm claiming that now that the Linux distros are shipping libibverbs,
> the number of users who have the openib BTL installed but do not have
> verbs-capable hardware will be *much* larger than those with verbs-
> capable hardware. Hence, I think the pain point should be for the
> smaller group (those with verbs-capable hardware): set an MCA param if
> you want to see the warning message.
>
> (we can debate the default value for the BTL-wide base param later --
> let's first just debate the *concept* as specific to the openib BTL)
>
>> It would be great if libibverbs could return two different error
>> messages
>> - one for "there's no IB card in this machine" and one for "there's
>> an IB
>> card here, but we can't initialize it". I think that would make this
>> argument go away. Open MPI could probably mimic that behavior by
>> parsing
>> the PCI tables, but that sounds ... painful.
>
> Yes, this capability in libiverbs would be good. Parsing the PCI
> tables doesn't sound like our role.
>
> I'll ask the libibverbs authors about it...
>
>> I guess the root of my concern is that unexpected behavior with no
>> explanation is (in my mind) the most dangerous case and the one we
>> should
>> address by default. And turning this error message off is going to
>> cause
>> unexpected behavior without explanation.
>
>
> But more information is available, and subject to normal
> troubleshooting techniques. And if you're in an environment where you
> *do* want to use verbs-capable hardware, then setting the MCA param
> seems perfectly acceptable to me. IIRC, LANL sets a whole pile of MCA
> params in the top-level openmpi-mca-params.conf file that are specific
> to their environment (right?). If that's true, what's one more param?
>
> Heck, the OMPI installed by OFED can set an MCA param in openmpi-mca-
> params.cof by default (which is what most verbs-capable-hardware-users
> utilize). That would solve the issue for 98% of the IB/iWARP users
> out there. Those who compile from source would need to do it manually.
>
> I agree that this is less than perfect. My main point is that I
> really don't like the idea of "mpirun a.out" will result in warning
> messages for perfectly valid configurations.
>
>