Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Linuxes shipping libibverbs
From: Terry Dontje (Terry.Dontje_at_[hidden])
Date: 2008-05-22 08:11:54

Jeff Squyres wrote:
> On May 22, 2008, at 6:50 AM, Terry Dontje wrote:
>>> Brian and I chatted a bit about this off-list, and I think we're in
>>> agreement now:
>>> - do not change the default value or meaning of
>>> btl_base_want_component_unsed.
>>> - major point of confusion: the openib BTL is actually fairly unique
>>> in that it can (and does) tell the difference between "there are no
>>> devices present" and "there are devices, but something went wrong".
>>> Other BTL's have network interfaces that can't tell the difference
>>> and
>>> can *only* call the no_nics function, regardless of whether there are
>>> no relevant network interfaces or some error occurred during
>>> initialization.
>>> - so a reasonable solution would be an openib-BTL-specific mechanism
>>> that doesn't call the no_nics function (to display that
>>> btl_base_want_component_unused) if there are no verbs-capable devices
>>> found because of the fact that mainline Linuxes are starting to ship
>>> libibverbs. Specific mechanism TBD; likely to be an openib MCA
>>> param.
>> So, if you are delivering something similar to a BTL for myrinet you
>> will see the message but
>> the belief is this is necessary since there isn't enough granularity
>> in
>> the error reporting of the
>> device to feel comfortable enough as to whether the user want the
>> device
>> to be used?
> The major difference here is that libmyriexpress is not being included
> in mainline Linux distributions. Specifically: if you can find/use
> libmyriexpress, it's likely because you have that hardware. The same
> *used* to be true for libibverbs, but is no longer true because Linux
> distros are now shipping (e.g., the Debian distribution pulls in
> libibverbs when you install Open MPI).
Ok, but there are distributions that do include the myrinet BTL/MTL (ie
CT). Though I agree
for the most part in the case of myrinet if you have libmyriexpress you
probably will probably have
an operable interface. I guess I am curious how many other BTLs a
distribution might end up
delivering that could run into this reporting issue. I guess my point
is could this be worth something
more general instead of a one off for IB?

 From my point of view the btl_warn_unused_components coupled with "-mca
btl ^mlfbtl" works for
me. However the fact that the IB vendors/community (ie CISCO) is
solving this for their favorite interface
makes me pause for a moment.
>> Won't udapl have a similar issue here or does it not get built by
>> default when OFED is built?
> We decided that under Linux, the udapl BTL does not get built by
> default (even if it could) because then an "mpirun a.out" by default
> would use both UDAPL and verbs, which is undesirable for several
> reasons. There's Linux-specific logic to this effect in config/
> ompi_check_udapl.m4.
Ok, that makes sense.
>> FWIW, our distribution actually turns off
>> btl_base_want_component_unused
>> because it seemed
>> the majority of our cases would be that users would false positive
>> sights of the message.
> Is the UDAPL library shipped in Solaris by default? If so, then
> you're likely in exactly the same kind of situation that I'm
> describing. The same will be true if Solaris ends up shipping
> libibverbs by default.
Yes the UDAPL library is shipped in Solaris by default. Which is why we
turn off
btl_warn_unused_components. Yes, and I suspect once Solaris starts
delivering libibverbs
we (Sun) will need to figure out how to handle having both the udapl and
openib btls being