Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Ralph H Castain (rhc_at_[hidden])
Date: 2006-08-30 08:53:30


On 8/30/06 6:40 AM, "Michael Kluskens" <mklus_at_[hidden]> wrote:

> I suspect that the problem is not that LSF does not copy the
> environment over but that Open MPI is accessing the other nodes not
> using LSF's method. Below is a related message by you that I have
> not tried to figure out yet, I was hoping for pointers by those
> people that use LSF:
>
> On Jul 18, 2006, at 8:18 AM, Jeff Squyres (jsquyres) wrote:
>
>> If you use the LSF drop-in replacement for rsh (lsgrun), you should be
>> ok because it will use LSF's native job-launching mechanisms behind
>> the
>> scenes (and therefore can use LSF's native job-termination mechanisms
>> when necessary).
>
> If this turns out to be all that is needed then is it possible for
> OpenMPI to autodetect when it is running under LSF and then use
> lsgrun instead of rsh/ssh?

You would need to do the following:

1. add a LSF component to the RAS framework. Currently, there is one there
called "lsf_bproc", but that is actually defunct - we don't use it. You
would need one that read the environment to find the assigned nodes and put
them on the registry. You can look at any of the other RAS components to see
the essential steps.

2. add a LSF component to the PLS framework. You *might* be able to just
copy the rsh/ssh launcher and substitute "lsgrun" for the ssh program for
the launch part. However, there are several other key functions that you
would need to alter to use LSF's job-termination mechanisms. Hence, just
using the rsh/ssh launcher won't work - you'll need something more
LSF-specific or use our job-termination mechanisms (and not LSF's).

3. In both of those components, you'll need a configure.m4 that checks for
an LSF-specific include file so you can decide if it's okay to build that
component. I'm no m4 expert, but you can obtain help here if necessary.

Ralph

>
> Michael
>
> On Aug 29, 2006, at 7:01 PM, Jeff Squyres wrote:
>
>> That's somewhat odd. I have very little experience with LSF, but I'm
>> surprised that they don't copy the environment over (others do).
>>
>> None of us have LSF, unfortunately, so we haven't done any work to
>> try to
>> make OMPI work on it.
>>
>>
>> On 8/25/06 10:14 AM, "Michael Kluskens" <mklus_at_[hidden]> wrote:
>>
>>> Is there anyone running OpenMPI on a machine with LSF batch queueing
>>> system.
>>>
>>> Last time I attempted this I discovered that PATH and LD_LIBRARY_PATH
>>> were not making it to the client nodes. I could force PATH to work
>>> using an OpenMPI option but I could not even force LD_LIBRARY_PATH
>>> over to the client nodes. I'd rather fix both and all other
>>> environmental variables with one fix so my test case is simply to use
>>> openmpi to run hostname.
>>>
>>> Before I started on this again I'd like to know if anyone has made
>>> more progress than I have.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users