Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Ralph H Castain (rhc_at_[hidden])
Date: 2007-07-10 13:26:31

I think that is quite accurate and would be helpful in resolving the

On 7/10/07 10:32 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:

> Point taken.
> Is this an accurate summary?
> 1. "Best practices" should be documented, to include sysadmins
> specifically itemizing what components should be used on their
> systems (e.g., in an environment variable or the system-wide MCA
> parameters file).
> 2. It may be useful to have some high-level parameters to specify a
> specific run-time environment, since ORTE has multiple, related
> frameworks (e.g., RAS and PLS). E.g., "orte_base_launcher=tm", or
> somesuch.
> On Jul 10, 2007, at 9:08 AM, Ralph H Castain wrote:
>> Actually, I was talking specifically about configuration at build
>> time. I
>> realize there are trade-offs here, and suspect we can find a common
>> ground.
>> The problem with using the options Jeff described is that they require
>> knowledge on the part of the builder as to what environments have
>> had their
>> include files/libraries installed on the file system of this
>> particular
>> machine. And unfortunately, not every component is protected by these
>> "sentinel" variables, nor does it appear possible to do so in a
>> "guaranteed
>> safe" manner.
>> Note that I didn't say "installed on their machine". In most cases,
>> these
>> alternative environments are not currently installed at all - they
>> are stale
>> files, or were placed on the file system by someone that wanted to
>> look at
>> their documentation, or whatever. The problem is that Open MPI
>> blindly picks
>> them up and attempts to use them, with sometimes disastrous and
>> frequently
>> unpredictable ways.
>> Hence, the user can be "astonished" to find that an application
>> that worked
>> perfectly yesterday suddenly segfaults today - because someone
>> decided one
>> day, for example, to un-tar the bproc files in a public place where
>> we pick
>> them up, and then someone else (perhaps a sys admin or the user
>> themselves)
>> at some later time rebuilt Open MPI to bring in an update.
>> Now imagine being a software provider who gets the call about a
>> problem with
>> Open MPI and has to figure out what the heck happened....
>> My suggested solution may not be the best, which is why I put it
>> out there
>> for discussion. One alternative might be for us to instruct sys
>> admins to
>> put MCA params in their default param file that force selection of the
>> proper components for each framework. Thus, someone with an lsf
>> system would
>> enter: pls=lsf ras=lsf sds=lsf in their config file to ensure that
>> only lsf
>> was used.
>> The negative to that approach is that we would have to warn
>> everyone any
>> time that list changed (e.g., a new component for a new framework).
>> Another
>> option to help that problem, of course, would be to set one mca
>> param (say
>> something like "enviro=lsf") that we would use internal to Open MPI
>> to set
>> the individual components correctly - i.e., we would hold the list of
>> relevant frameworks internally since (hopefully) we know what they
>> should be
>> for a given environment.
>> Anyway, I'm glad people are looking at this and suggesting
>> solutions. It is
>> a problem that seems to be biting us recently and may become a
>> bigger issue
>> as the user community grows.
>> Ralph
>> On 7/10/07 6:12 AM, "Bogdan Costescu"
>> <Bogdan.Costescu_at_[hidden]> wrote:
>>> On Tue, 10 Jul 2007, Jeff Squyres wrote:
>>>> Do either of these work for you?
>>> Will report back in a bit, I'm now in the middle of an OS upgrade on
>>> the cluster.
>>> But my question was more like: is this a configuration that should
>>> theoretically work ? Or in other words, are there known dependencies
>>> on rsh that would make a rsh-less build not work or work with reduced
>>> functionality ?
>>>> Most batch systems today set a sentinel environment variable that we
>>>> check for.
>>> I think that we talk about slightly different things - my impression
>>> was that the OP was asking about detection at config time, while your
>>> statements make perfect sense to me if they are relative to detection
>>> at run-time. If the OP was indeed asking about run-time detection,
>>> then I apologize for the time you wasted on reading and replying
>>> to my
>>> questions...
>>>> That's what the compile-time vs. run-time detection and selection is
>>>> supposed to be for.
>>> Yes, I understand that, it's the same type of mechanism as in LAM/MPI
>>> which it's not that foreign to me ;-)
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]