On Jul 10, 2007, at 6:07 AM, Bogdan Costescu wrote:
>> For example, I can readily find machines that are running TM, but
>> also have LSF and SLURM libraries installed (although those
>> environments are not "active" - the libraries in some cases are old
>> and stale, usually present because either someone wanted to look at
>> them or represent an old installation).
> Whatever the outcome of this discussion is, please keep in mind that
> this represents an exception rather than the rule. So the common cases
> of no batch environment or one batch environment installed should work
> as effortless as possible. Furthermore, keep in mind that there are
> lots of people who don't compile themselves Open MPI, but rely on
> packages compiled by others (Linux distributions, most likely) - so
> don't make life harder for those who produce these packages.
FWIW, this is exactly the reason that we have the "auto as much as
possible" behavior today; back in LAM/MPI, we had the problem that
[many] users would say "I built LAM, but it doesn't support ABC, even
though your manual says that it does! LAM's a piece of junk!" The
sad fact is that most people assume that "./configure && make
install" will do all the Right magic for their system; efforts at
education seemed to fail. So we took the path of least resistance
and assumed that if we can find it on your system, we should use it.
Specifically: it was more of a support issue than anything else.
>> 1. ... we would only build support for those environments that the
>> builder specifies, and error out of the build process if multiple
>> conflicting environments are specified.
> I think that Ralf's suggestion (auto unless forced) is better, as it
> - a better chance of finding the environments for people who don't
> have too much experience with building Open MPI or hate to RTFM
> - control over what is built or not for people who know what they
> are doing
>> This raises the issue of what to do with rsh, but I think we can
>> handle that one by simply building it wherever possible.
> I've been meaning to ask this for some time: is it possible to get rid
> of rsh support when building/running in an environment where rsh is
> not used (like a TM-based one) ? I'm not trying to achieve security by
> doing this (after all, a user can build a separate copy of Open MPI
> with rsh support), but just to make sure that the programs that I
> build are either using the "blessed" start-up mechanism or error out.
Do either of these work for you?
1. Use the --enable-mca-no-build option as I discussed in a mail a
few minutes ago.
2. Remove the "mca_pls_rsh.*" files in $prefix/lib/openmpi.
>> 2. We could laboriously go through all the components and ensure
>> that they
>> check in their selection logic to see if that environment is active.
> I might be missing something in the design of batch systems or
> software in general, but how do you decide that an environment is
> active or not ?
Most batch systems today set a sentinel environment variable that we
> Can a library check if it's being used in a program ?
> Or if that program actually runs ? And if a configuration file exists,
> does it mean that the environment is actually active ?
We do not generally assume that the presence of a plugin means that
that plugin can run in the current environment. I thought that all
framework selection logic was adapted to this philosophy, but
apparently Ralph is indicating that some do not. :-)
> How to deal
> with the case where there are several versions of the same batch
> system installed, all using the same configuration files and therefore
> being ready to run ?
We assume that Open MPI was built compiling/linking against the Right
version. There's not much else we can do if you build against the
> And how about the case where there is a machine
> reserved for compilations, where libraries are made available but
> there is no batch system active ?
That's what the compile-time vs. run-time detection and selection is
supposed to be for. The presence of an OMPI component at run-time is
not supposed to mean that it can run; it's supposed to be queried and
the component can do whatever checks it wants to see if it is
supposed to run, and then report "Yes, I can run" / "No, I cannot
run" back to Open MPI.