Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Linuxes shipping libibverbs
From: Pavel Shamis (Pasha) (pasha_at_[hidden])
Date: 2008-05-22 11:53:38


>
>
> I'm not sure I follow this logic -- can you explain more?
Sure
>
> Why does this only apply to binary distribution? If libibverbs is
> installed by default, then OMPI will still build the openib BTL (and
> therefore warn if it's not used). Granted, some distros will only
> install libibverbs if either explicitly or implicitly requested (e.g.,
> via dependency). What if some other dependency pulls in libibverbs,
> even if OMPI was built from a source tarball?
My point is that it is not correct to say that libibverbs installed by
defaults on all Linuxes and all users that will install ompi will see
this problem.
It possible that libibverbs maybe installed somehow implicitly as
dependency package. But usually (not always) it will be installed as
part of some IB native (openib only !) application.

If user will decide to upgrade his ompi + libibverb rpm/deb package
install , he will be need to do a lot of other "annoying" steps, like:
source code download, installing all required *-dev.rpm , compilation.
And I guess that disabling the defaults warning messages will be
simplest step on all the way :-)

I don't want to say that current solution is best one.
But I would like to find something better than disabling the warning by
default only for openib.

>
> Let me ask another question: is it common to have the verbs stack /
> hardware so hosed up that ibv_get_device_list() returns an empty list
> when there really is a device there? My assumption is that this is
> quite uncommon; that ibv_get_device_list() will usually return that
> there *are* devices and errors show up later during initialization,
> etc. Never say "never", of course; I'm sure that there are degenerate
> corner cases where a badly hosed device will cause
> ibv_get_device_list() to return an empty list -- but I'm assuming that
> those cases are very few and far between.
I can not say that it is very uncommon case.
For example:
1. Driver doesn't support the HCA - If I remember correct , RH40 by
default doesn't support ConnectX hca . The device_list will be empty. It
is very exotic case.
2. Driver version doesn't correspond with fw version
3. FW was broken
4. Driver was broken and failed to start - it is not very exotic case
too. Some times user make some modification - upgrade/install/etc.. and
it brakes driver.

> In such cases, the ibv_devinfo(1) and ibv_devices(1) commands would
> show the same error.
Yep these utilities will show the same error.

Cases 1-2-3 we may cover pretty simple. OPENIB driver creates
"/dev/infiniband" during his startup. So if /dev/infiniband exists and
_get_device_list() is empty we may print warning.
I don't know how we can cover case 4 :-(

BTW I think that problem is relevant for all BTLs and not only openib
and may be we need look for some global solution.

Pasha.