Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Linuxes shipping libibverbs
From: Pavel Shamis (Pasha) (pasha_at_[hidden])
Date: 2008-05-21 12:41:58


As I know only Openib kernel drivers is installed by default with
distribution.
But the user level - libibverbs and other openib stuff is not installed
by default. User need go to the package manager and explicitly
select libibverb. So if user decided to install libibverbs he had
reasons for it, and I think it will be ok to show this warning.

Pasha.

Jeff Squyres wrote:
> On May 21, 2008, at 11:14 AM, Brian W. Barrett wrote:
>
>
>> I think having a parameter to turn off the warning is a great idea.
>> So
>> great in fact, that it already exists in the trunk and v1.2 :)!
>> Setting
>> the default value for the btl_base_warn_component_unused flag from 0
>> to 1
>> will have the desired effect.
>>
>
> Ah, ok. I either didn't know about this flag or forgot about it. :-)
>
> I just tested this myself and see that there are actually *two* error
> messages (on a machine where I installed libibverbs, but with no
> OpenFabrics hardware, with OMPI 1.2.6):
>
> % mpirun -np 1 hello
> libibverbs: Fatal: couldn't read uverbs ABI version.
> --------------------------------------------------------------------------
> [0,1,0]: OpenIB on host eddie.osl.iu.edu was unable to find any HCAs.
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
>
> So the MCA param takes care of the OMPI message; I'll contact the
> libibverbs authors about their message.
>
>
>> I'm not sure I agree with setting the default to 0, however. The
>> warning
>> has proven extremely useful for diagnosing that IB (or less often GM
>> or
>> MX) isn't properly configured on a compute node due to some random
>> error.
>> It's trivially easy for any packaging group to have the line
>>
>> btl_base_warn_component_unused = 0
>>
>> added to $prefix/etc/openmpi-mca-params.conf during the install
>> phase of
>> the package build (indeed, our simple build scripts at LANL used to do
>> this on a regular bases due to our need to tweek the OOB to keep IPoIB
>> happier at scale).
>>
>> I think keeping the Debian guys happy is a good thing. Giving them an
>> easy way to turn off silly warnings is a good thing. Removing a known
>> useful warning to help them doesn't seem like a good thing.
>>
>
> I guess that this is what I am torn about. Yes, it's a useful message
> -- in some cases. But now that libibverbs is shipping in Debain and
> other Linuxes, the number of machines out there with verbs-capable
> hardware is far, far smaller than the number of machines without verbs-
> capable hardware. Specifically:
>
> 1. The number of cases where seeing the message by default is *not*
> useful is now potentially [much] larger than the number of cases where
> the default message is useful.
>
> 2. An out-of-the-box "mpirun a.out" will print warning messages in
> perfectly valid/good configurations (no verbs-capable hardware, but
> just happen to have libibverbs installed). This is a Big Deal.
>
> 3. Problems with HCA hardware and/or verbs stack are uncommon
> (nowadays). I'd be ok asking someone to enable a debug flag to get
> more information on configuration problems or hardware faults.
>
> Shouldn't we be optimizing for the common case?
>
> In short: I think it's no longer safe to assume that machines with
> libibverbs installed must also have verbs-capable hardware.
>
>