Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Linuxes shipping libibverbs
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-05-21 15:38:10


One thing I should clarify -- the ibverbs error message from my
previous mail is a red herring. libibverbs prints that message on
systems where the kernel portions of the OFED stack are not installed
(such as the quick-n-dirty test that I did before -- all I did was
install libibverbs without the corresponding kernel stuff). I
installed the whole OFED stack on a machine with no verbs-capable
hardware and verified that the libibverbs message does *not* appear
when the kernel bits are properly installed and running.

So we're only talking about the Open MPI warning message here. More
below.

On May 21, 2008, at 12:17 PM, Brian W. Barrett wrote:

>> 2. An out-of-the-box "mpirun a.out" will print warning messages in
>> perfectly valid/good configurations (no verbs-capable hardware, but
>> just happen to have libibverbs installed). This is a Big Deal.
>
> Which is easily solved with a better error message, as Pasha
> suggested.

I guess this is where we disagree: I don't believe that the issue is
solved by making a "better" message. Specifically: this is the first
case where we're saying "if you run with a valid configuration, you're
going to get a warning message and you have to do something extra to
turn it off."

That just seems darn weird to me, especially when other MPI's don't do
the same thing. Come to think of it, I can't think of many other
software packages that do that.

>> In short: I think it's no longer safe to assume that machines with
>> libibverbs installed must also have verbs-capable hardware.
>
> But here's the real problem -- with our current selection logic, a
> user
> with libibverbs but no IB cards gets an error message saying "hey,
> we need
> you to set this flag to make this error go away" (or would, per
> Pasha's
> suggestion). A user with a busted IB stack on a node (which we
> still saw
> pretty often at LANL) starts using TCP and their application runs
> like a
> dog.
>
> I guess it's a matter of how often you see errors in the IB stack that
> cause nic initialization to fail. The machines I tend to use still
> exhibit this problem pretty often, but it's possible I just work on
> bad
> hardware more often than is usual in the wild.

I guess this is the central issue: what *is* the common case? Which
set of users should be forced to do something different?

I'm claiming that now that the Linux distros are shipping libibverbs,
the number of users who have the openib BTL installed but do not have
verbs-capable hardware will be *much* larger than those with verbs-
capable hardware. Hence, I think the pain point should be for the
smaller group (those with verbs-capable hardware): set an MCA param if
you want to see the warning message.

(we can debate the default value for the BTL-wide base param later --
let's first just debate the *concept* as specific to the openib BTL)

> It would be great if libibverbs could return two different error
> messages
> - one for "there's no IB card in this machine" and one for "there's
> an IB
> card here, but we can't initialize it". I think that would make this
> argument go away. Open MPI could probably mimic that behavior by
> parsing
> the PCI tables, but that sounds ... painful.

Yes, this capability in libiverbs would be good. Parsing the PCI
tables doesn't sound like our role.

I'll ask the libibverbs authors about it...

> I guess the root of my concern is that unexpected behavior with no
> explanation is (in my mind) the most dangerous case and the one we
> should
> address by default. And turning this error message off is going to
> cause
> unexpected behavior without explanation.

But more information is available, and subject to normal
troubleshooting techniques. And if you're in an environment where you
*do* want to use verbs-capable hardware, then setting the MCA param
seems perfectly acceptable to me. IIRC, LANL sets a whole pile of MCA
params in the top-level openmpi-mca-params.conf file that are specific
to their environment (right?). If that's true, what's one more param?

Heck, the OMPI installed by OFED can set an MCA param in openmpi-mca-
params.cof by default (which is what most verbs-capable-hardware-users
utilize). That would solve the issue for 98% of the IB/iWARP users
out there. Those who compile from source would need to do it manually.

I agree that this is less than perfect. My main point is that I
really don't like the idea of "mpirun a.out" will result in warning
messages for perfectly valid configurations.

-- 
Jeff Squyres
Cisco Systems