I have been working on adding/clarifying support for several environments
and have encountered a problem that appears to be fairly common out there.
Namely, machines that have - over the course of history or for specific
reasons - installed libraries to support multiple environments. For example,
I can readily find machines that are running TM, but also have LSF and SLURM
libraries installed (although those environments are not "active" - the
libraries in some cases are old and stale, usually present because either
someone wanted to look at them or represent an old installation).
The problem is that our Open MPI build system automatically detects the
presence of those libraries, builds the corresponding components, and then
links those libraries into our system. Unfortunately, this causes two
1. we wind up building and loading a bunch of components that we cannot use
- which impacts memory footprint; and
2. not every component in every framework runs some library function to
determine if that environment is actually active. Hence, our selection logic
can sometimes get confused due to conflicting priorities, resulting in the
selection of components that cause the system to crash
A couple of solutions come immediately to mind:
1. The most obvious one (to me, at least) is to require that people provide
"--with-xx" when they build the system. Instead of automatically detecting
an include file and library, and then deciding that the existence of those
files dictates that we build support for that environment, we would only
build support for those environments that the builder specifies, and error
out of the build process if multiple conflicting environments are specified.
This raises the issue of what to do with rsh, but I think we can handle that
one by simply building it wherever possible.
2. We could laboriously go through all the components and ensure that they
check in their selection logic to see if that environment is active. This
still causes libraries to be loaded for nothing, but keeps the automatic
nature of the build system. We would have to deal with those environments
that may not have a "safe" function we can call to see if they are "alive",
or have old/stale libraries that may have differing behavior in their APIs,
but perhaps those are few enough to not be a big problem.
Any thoughts on this? It seems like we should solve this as it is becoming
more prevalent (at least on the machines I test on).