I second this; it's been an annoyance here at LLNL, even for OFED v1.1,
which they prefix into /usr.
Jeff Squyres wrote:
> I just upgraded my Cisco MPI development cluster to OFED 1.2 over the
> weekend. This morning, I discovered a fun situation with regards to
> WHAT: We propose adding a check into the udapl configury to disable
> automatically building the udapl BTL when on Linux/OFED. --with-
> udapl can be specified to override the check and do the normal udapl
> configury stuff.
> WHY: The udapl BTL is built by default on OFED 1.2 clusters (because
> the UDAPL libraries are in /lib), but the /etc/dat.conf file that
> ships in OFED 1.2 is broken such that the UDAPL BTL will emit
> warnings upon init.
> WHERE: config/ompi_check_udapl.m4
> WHEN: ASAP -- I want this for v1.2.4 because affects all OFED 1.2 users
> TIMEOUT: Thursday COB (because I think Brian's out today?)
> Short version:
> Terry, George, and Jeff propose to add a check into
> ompi_check_udapl.m4 that will disable building the udapl BTL by
> default when on Linux. You can specify --with-udapl when on Linux to
> force the normal check-for-headers-and-libraries udapl configure
> stuff. When not on Linux (e.g., Solaris), the normal check-for-
> headers-and-libraries configure stuff will always happen.
> Long version:
> Since OFED 1.2 [by default] installs into /usr, Open MPI's configure
> script finds the header files/libraries for both verbs and uDAPL, and
> therefore builds both the openib and udapl BTLs. Keep in mind that
> on Linux/OFED, uDAPL is implemented as a layer on top of verbs, so it
> is not the "preferred" transport to use -- we want to use verbs
> (i.e., the openib BTL).
> After some poking around (and checking with George/Galen), we found
> that the BTL exclusivity parameter in the openib BTL is set to
> MCA_BTL_EXCLUSIVITY_DEFAULT; the udapl BTL sets it to
> (MCA_BTL_EXCLUSIVITY_DEFAULT-10). So that's good -- if Open MPI
> loads both BTLs, it's going to effectively ignore the udapl BTL
> (after initializing it) and use the openib BTL -- which is what we want.
> The problem is that OFED 1.2 ships with an /etc/dat.conf that is
> effectively broken (dat.conf is the text config file for DAT/DAPL).
> The udapl BTL attempts to open all DAPL providers, but by the default
> dat.conf in OFED 1.2, some or all of them will fail (and the UDAPL
> BTL will print warnings for each failure).
> On Solaris, where UDAPL *is* the high performance network, if there
> are any problems with dat.conf, users will want to know -- they will
> want to see the warnings from the UDAPL BTL. But on Linux, you
> likely don't care about these warnings because you don't care about
> UDAPL anyway (because you almost certainly want to be using the
> openib/verbs BTL).
> Terry, George, and I went through a bunch of different possible
> scenarios to fix this dichotomy, and concluded that the one that was
> the least evil was simply to disable building the udapl BTL on Linux
> by default -- you can override this default by specifying --with-
> udapl on the configure command line. This solution has the following
> 1. Most importantly, the default configure/build/run on Solaris and
> Linux/OFED clusters works -- it follows the Law of Least Astonishment.
> 2. Avoids schitzophrenia in the UDAPL BTL trying to divine when a
> user would care about the warning messages or not.
> If anyone *wants* the UDAPL BTL build on Linux, they'll likely
> disagree that we follow the Law of Least Astonishment, but I suspect
> that that is a fairly small group of people. We'll add something to
> the FAQ about this issue so that at least the solution is a simple
> Google search away.