Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem launching onto Bourne shell
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-10-09 14:45:26


FWIW, the fix has been pushed into the trunk, 1.2.8, and 1.3 SVN
branches. So I'll probably take down the hg tree (we use those as
temporary branches).

On Oct 9, 2008, at 2:32 PM, Hahn Kim wrote:

> Hi,
>
> Thanks for providing a fix, sorry for the delay in response. Once I
> found out about -x, I've been busy working on the rest of our code,
> so I haven't had the time to try out the fix. I'll take a look at
> it soon as I can and will let you know how it works out.
>
> Hahn
>
> On Oct 7, 2008, at 5:41 PM, Jeff Squyres wrote:
>
>> On Oct 7, 2008, at 4:19 PM, Hahn Kim wrote:
>>
>>>> you probably want to set the LD_LIBRARY_PATH (and PATH, likely, and
>>>> possibly others, such as that LICENSE key, etc.) regardless of
>>>> whether it's an interactive or non-interactive login.
>>>
>>> Right, that's exactly what I want to do. I was hoping that mpirun
>>> would run .profile as the FAQ page stated, but the -x fix works for
>>> now.
>>
>> If you're using Bash, it should be running .bashrc. But it looks
>> like
>> you did identify a bug that we're *not* running .profile. I have a
>> Mercurial branch up with a fix if you want to give it a spin:
>>
>> http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/sh-profile-fixes/
>>
>>> I just realized that I'm using .bash_profile on the x86 and need to
>>> move its contents into .bashrc and call .bashrc from .bash_profile,
>>> since eventually I will also be launching MPI jobs onto other x86
>>> processors.
>>>
>>> Thanks to everyone for their help.
>>>
>>> Hahn
>>>
>>> On Oct 7, 2008, at 2:16 PM, Jeff Squyres wrote:
>>>
>>>> On Oct 7, 2008, at 12:48 PM, Hahn Kim wrote:
>>>>
>>>>> Regarding 1., we're actually using 1.2.5. We started using Open
>>>>> MPI
>>>>> last winter and just stuck with it. For now, using the -x flag
>>>>> with
>>>>> mpirun works. If this really is a bug in 1.2.7, then I think
>>>>> we'll
>>>>> stick with 1.2.5 for now, then upgrade later when it's fixed.
>>>>
>>>> It looks like this behavior has been the same throughout the entire
>>>> 1.2 series.
>>>>
>>>>> Regarding 2., are you saying I should run the commands you suggest
>>>>> from the x86 node running bash, so that ssh logs into the Cell
>>>>> node
>>>>> running Bourne?
>>>>
>>>> I'm saying that if "ssh othernode env" gives different answers than
>>>> "ssh othernode"/"env", then your .bashrc or .profile or whatever is
>>>> dumping out early depending on whether you have an interactive
>>>> login
>>>> or not. This is the real cause of the error -- you probably want
>>>> to
>>>> set the LD_LIBRARY_PATH (and PATH, likely, and possibly others,
>>>> such
>>>> as that LICENSE key, etc.) regardless of whether it's an
>>>> interactive
>>>> or non-interactive login.
>>>>
>>>>>
>>>>> When I run "ssh othernode env" from the x86 node, I get the
>>>>> following vanilla environment:
>>>>>
>>>>> USER=ha17646
>>>>> HOME=/home/ha17646
>>>>> LOGNAME=ha17646
>>>>> SHELL=/bin/sh
>>>>> PWD=/home/ha17646
>>>>>
>>>>> When I run "ssh othernode" from the x86 node, then run "env" on
>>>>> the
>>>>> Cell, I get the following:
>>>>>
>>>>> USER=ha17646
>>>>> LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
>>>>> HOME=/home/ha17646
>>>>> MCS_LICENSE_PATH=/opt/MultiCorePlus/mcf.key
>>>>> LOGNAME=ha17646
>>>>> TERM=xterm-color
>>>>> PATH=/usr/local/bin:/usr/bin:/sbin:/bin:/tools/openmpi-1.2.5/bin:/
>>>>> tools/cmake-2.4.7/bin:/tools
>>>>> SHELL=/bin/sh
>>>>> PWD=/home/ha17646
>>>>> TZ=EST5EDT
>>>>>
>>>>> Hahn
>>>>>
>>>>> On Oct 7, 2008, at 12:07 PM, Jeff Squyres wrote:
>>>>>
>>>>>> Ralph and I just talked about this a bit:
>>>>>>
>>>>>> 1. In all released versions of OMPI, we *do* source the .profile
>>>>>> file
>>>>>> on the target node if it exists (because vanilla Bourne shells do
>>>>>> not
>>>>>> source anything on remote nodes -- Bash does, though, per the
>>>>>> FAQ).
>>>>>> However, looking in 1.2.7, it looks like it might not be
>>>>>> executing
>>>>>> that code -- there *may* be a bug in this area. We're checking
>>>>>> into it.
>>>>>>
>>>>>> 2. You might want to check your configuration to see if
>>>>>> your .bashrc
>>>>>> is dumping out early because it's a non-interactive shell. Check
>>>>>> the
>>>>>> output of:
>>>>>>
>>>>>> ssh othernode env
>>>>>> vs.
>>>>>> ssh othernode
>>>>>> env
>>>>>>
>>>>>> (i.e., a non-interactive running of "env" vs. an interactive
>>>>>> login
>>>>>> and
>>>>>> running "env")
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Oct 7, 2008, at 8:53 AM, Ralph Castain wrote:
>>>>>>
>>>>>>> I am unaware of anything in the code that would
>>>>>>> "source .profile"
>>>>>>> for you. I believe the FAQ page is in error here.
>>>>>>>
>>>>>>> Ralph
>>>>>>>
>>>>>>> On Oct 6, 2008, at 7:47 PM, Hahn Kim wrote:
>>>>>>>
>>>>>>>> Great, that worked, thanks! However, it still concerns me that
>>>>>>>> the
>>>>>>>> FAQ page says that mpirun will execute .profile which doesn't
>>>>>>>> seem
>>>>>>>> to work for me. Are there any configuration issues that could
>>>>>>>> possibly be preventing mpirun from doing this? It would
>>>>>>>> certainly
>>>>>>>> be more convenient if I could maintain my environment in a
>>>>>>>> single .profile file instead of adding what could potentially
>>>>>>>> be a
>>>>>>>> lot of -x arguments to my mpirun command.
>>>>>>>>
>>>>>>>> Hahn
>>>>>>>>
>>>>>>>> On Oct 6, 2008, at 5:44 PM, Aurélien Bouteiller wrote:
>>>>>>>>
>>>>>>>>> tYou can forward your local env with mpirun -x
>>>>>>>>> LD_LIBRARY_PATH. As
>>>>>>>>> an
>>>>>>>>> alternative you can set specific values with mpirun -x
>>>>>>>>> LD_LIBRARY_PATH=/some/where:/some/where/else . More
>>>>>>>>> information
>>>>>>>>> with
>>>>>>>>> mpirun --help (or man mpirun).
>>>>>>>>>
>>>>>>>>> Aurelien
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Le 6 oct. 08 à 16:06, Hahn Kim a écrit :
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I'm having difficulty launching an Open MPI job onto a
>>>>>>>>>> machine
>>>>>>>>>> that
>>>>>>>>>> is running the Bourne shell.
>>>>>>>>>>
>>>>>>>>>> Here's my basic setup. I have two machines, one is an x86-
>>>>>>>>>> based
>>>>>>>>>> machine running bash and the other is a Cell-based machine
>>>>>>>>>> running
>>>>>>>>>> Bourne shell. I'm running mpirun from the x86 machine, which
>>>>>>>>>> launches a C++ MPI application onto the Cell machine. I get
>>>>>>>>>> the
>>>>>>>>>> following error:
>>>>>>>>>>
>>>>>>>>>> error while loading shared libraries: libstdc++.so.6: cannot
>>>>>>>>>> open
>>>>>>>>>> shared object file: No such file or directory
>>>>>>>>>>
>>>>>>>>>> The basic problem is that LD_LIBRARY_PATH needs to be set to
>>>>>>>>>> the
>>>>>>>>>> directory that contains libstdc++.so.6 for the Cell. I set
>>>>>>>>>> the
>>>>>>>>>> following line in .profile:
>>>>>>>>>>
>>>>>>>>>> export LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/
>>>>>>>>>> 4.1.1/32
>>>>>>>>>>
>>>>>>>>>> which is the path to the PPC libraries for Cell.
>>>>>>>>>>
>>>>>>>>>> Now if I log directly into the Cell machine and run the
>>>>>>>>>> program
>>>>>>>>>> directly from the command line, I don't get the above error.
>>>>>>>>>> But
>>>>>>>>>> mpirun still fails, even after setting LD_LIBRARY_PATH
>>>>>>>>>> in .profile.
>>>>>>>>>>
>>>>>>>>>> As a sanity check, I did the following. I ran the following
>>>>>>>>>> command
>>>>>>>>>> from the x86 machine:
>>>>>>>>>>
>>>>>>>>>> mpirun -np 1 --host cab0 env
>>>>>>>>>>
>>>>>>>>>> which, among others things, shows me the following value:
>>>>>>>>>>
>>>>>>>>>> LD_LIBRARY_PATH=/tools/openmpi-1.2.5/lib:
>>>>>>>>>>
>>>>>>>>>> If I log into the Cell machine and run env directly from the
>>>>>>>>>> command
>>>>>>>>>> line, I get the following value:
>>>>>>>>>>
>>>>>>>>>> LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
>>>>>>>>>>
>>>>>>>>>> So it appears that .profile gets sourced when I log in but
>>>>>>>>>> not
>>>>>>>>>> when
>>>>>>>>>> mpirun runs.
>>>>>>>>>>
>>>>>>>>>> However, according to the OpenMPI FAQ (http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
>>>>>>>>>> ), mpirun is supposed to directly call .profile since Bourne
>>>>>>>>>> shell
>>>>>>>>>> doesn't automatically call it for non-interactive shells.
>>>>>>>>>>
>>>>>>>>>> Does anyone have any insight as to why my environment isn't
>>>>>>>>>> being
>>>>>>>>>> set properly? Thanks!
>>>>>>>>>>
>>>>>>>>>> Hahn
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Hahn Kim, hgk_at_[hidden]
>>>>>>>>>> MIT Lincoln Laboratory
>>>>>>>>>> 244 Wood St., Lexington, MA 02420
>>>>>>>>>> Tel: 781-981-0940, Fax: 781-981-5255
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> users_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> * Dr. Aurélien Bouteiller
>>>>>>>>> * Sr. Research Associate at Innovative Computing Laboratory
>>>>>>>>> * University of Tennessee
>>>>>>>>> * 1122 Volunteer Boulevard, suite 350
>>>>>>>>> * Knoxville, TN 37996
>>>>>>>>> * 865 974 6321
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Hahn Kim
>>>>>>>> MIT Lincoln Laboratory Phone: (781) 981-0940
>>>>>>>> 244 Wood Street, S2-252 Fax: (781) 981-5255
>>>>>>>> Lexington, MA 02420 E-mail: hgk_at_[hidden]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jeff Squyres
>>>>>> Cisco Systems
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> --
>>>>> Hahn Kim, hgk_at_[hidden]
>>>>> MIT Lincoln Laboratory
>>>>> 244 Wood St., Lexington, MA 02420
>>>>> Tel: 781-981-0940, Fax: 781-981-5255
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> --
>>> Hahn Kim, hgk_at_[hidden]
>>> MIT Lincoln Laboratory
>>> 244 Wood St., Lexington, MA 02420
>>> Tel: 781-981-0940, Fax: 781-981-5255
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
> Hahn Kim, hgk_at_[hidden]
> MIT Lincoln Laboratory
> 244 Wood St., Lexington, MA 02420
> Tel: 781-981-0940, Fax: 781-981-5255
>
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems