Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem launching onto Bourne shell
From: Mostyn Lewis (Mostyn.Lewis_at_[hidden])
Date: 2008-10-16 19:05:32


Jeff,

You broke my ksh (and I expect something else)
Today's SVN 1.4a1r19757
orte/mca/plm/rsh/plm_rsh_module.c
line 471:
         tmp = opal_argv_split("( test ! -r ./.profile || . ./.profile;", ' ');
                                ^
                                ARGHH
No (
         tmp = opal_argv_split(" test ! -r ./.profile || . ./.profile;", ' ');
and all is well again :)

Regards,
Mostyn

On Thu, 9 Oct 2008, Jeff Squyres wrote:

> FWIW, the fix has been pushed into the trunk, 1.2.8, and 1.3 SVN branches.
> So I'll probably take down the hg tree (we use those as temporary branches).
>
> On Oct 9, 2008, at 2:32 PM, Hahn Kim wrote:
>
>> Hi,
>>
>> Thanks for providing a fix, sorry for the delay in response. Once I found
>> out about -x, I've been busy working on the rest of our code, so I haven't
>> had the time to try out the fix. I'll take a look at it soon as I can and
>> will let you know how it works out.
>>
>> Hahn
>>
>> On Oct 7, 2008, at 5:41 PM, Jeff Squyres wrote:
>>
>>> On Oct 7, 2008, at 4:19 PM, Hahn Kim wrote:
>>>
>>>>> you probably want to set the LD_LIBRARY_PATH (and PATH, likely, and
>>>>> possibly others, such as that LICENSE key, etc.) regardless of
>>>>> whether it's an interactive or non-interactive login.
>>>>
>>>> Right, that's exactly what I want to do. I was hoping that mpirun
>>>> would run .profile as the FAQ page stated, but the -x fix works for
>>>> now.
>>>
>>> If you're using Bash, it should be running .bashrc. But it looks like
>>> you did identify a bug that we're *not* running .profile. I have a
>>> Mercurial branch up with a fix if you want to give it a spin:
>>>
>>> http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/sh-profile-fixes/
>>>
>>>> I just realized that I'm using .bash_profile on the x86 and need to
>>>> move its contents into .bashrc and call .bashrc from .bash_profile,
>>>> since eventually I will also be launching MPI jobs onto other x86
>>>> processors.
>>>>
>>>> Thanks to everyone for their help.
>>>>
>>>> Hahn
>>>>
>>>> On Oct 7, 2008, at 2:16 PM, Jeff Squyres wrote:
>>>>
>>>>> On Oct 7, 2008, at 12:48 PM, Hahn Kim wrote:
>>>>>
>>>>>> Regarding 1., we're actually using 1.2.5. We started using Open MPI
>>>>>> last winter and just stuck with it. For now, using the -x flag with
>>>>>> mpirun works. If this really is a bug in 1.2.7, then I think we'll
>>>>>> stick with 1.2.5 for now, then upgrade later when it's fixed.
>>>>>
>>>>> It looks like this behavior has been the same throughout the entire
>>>>> 1.2 series.
>>>>>
>>>>>> Regarding 2., are you saying I should run the commands you suggest
>>>>>> from the x86 node running bash, so that ssh logs into the Cell node
>>>>>> running Bourne?
>>>>>
>>>>> I'm saying that if "ssh othernode env" gives different answers than
>>>>> "ssh othernode"/"env", then your .bashrc or .profile or whatever is
>>>>> dumping out early depending on whether you have an interactive login
>>>>> or not. This is the real cause of the error -- you probably want to
>>>>> set the LD_LIBRARY_PATH (and PATH, likely, and possibly others, such
>>>>> as that LICENSE key, etc.) regardless of whether it's an interactive
>>>>> or non-interactive login.
>>>>>
>>>>>>
>>>>>> When I run "ssh othernode env" from the x86 node, I get the
>>>>>> following vanilla environment:
>>>>>>
>>>>>> USER=ha17646
>>>>>> HOME=/home/ha17646
>>>>>> LOGNAME=ha17646
>>>>>> SHELL=/bin/sh
>>>>>> PWD=/home/ha17646
>>>>>>
>>>>>> When I run "ssh othernode" from the x86 node, then run "env" on the
>>>>>> Cell, I get the following:
>>>>>>
>>>>>> USER=ha17646
>>>>>> LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
>>>>>> HOME=/home/ha17646
>>>>>> MCS_LICENSE_PATH=/opt/MultiCorePlus/mcf.key
>>>>>> LOGNAME=ha17646
>>>>>> TERM=xterm-color
>>>>>> PATH=/usr/local/bin:/usr/bin:/sbin:/bin:/tools/openmpi-1.2.5/bin:/
>>>>>> tools/cmake-2.4.7/bin:/tools
>>>>>> SHELL=/bin/sh
>>>>>> PWD=/home/ha17646
>>>>>> TZ=EST5EDT
>>>>>>
>>>>>> Hahn
>>>>>>
>>>>>> On Oct 7, 2008, at 12:07 PM, Jeff Squyres wrote:
>>>>>>
>>>>>>> Ralph and I just talked about this a bit:
>>>>>>>
>>>>>>> 1. In all released versions of OMPI, we *do* source the .profile
>>>>>>> file
>>>>>>> on the target node if it exists (because vanilla Bourne shells do
>>>>>>> not
>>>>>>> source anything on remote nodes -- Bash does, though, per the FAQ).
>>>>>>> However, looking in 1.2.7, it looks like it might not be executing
>>>>>>> that code -- there *may* be a bug in this area. We're checking
>>>>>>> into it.
>>>>>>>
>>>>>>> 2. You might want to check your configuration to see if
>>>>>>> your .bashrc
>>>>>>> is dumping out early because it's a non-interactive shell. Check
>>>>>>> the
>>>>>>> output of:
>>>>>>>
>>>>>>> ssh othernode env
>>>>>>> vs.
>>>>>>> ssh othernode
>>>>>>> env
>>>>>>>
>>>>>>> (i.e., a non-interactive running of "env" vs. an interactive login
>>>>>>> and
>>>>>>> running "env")
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Oct 7, 2008, at 8:53 AM, Ralph Castain wrote:
>>>>>>>
>>>>>>>> I am unaware of anything in the code that would "source .profile"
>>>>>>>> for you. I believe the FAQ page is in error here.
>>>>>>>>
>>>>>>>> Ralph
>>>>>>>>
>>>>>>>> On Oct 6, 2008, at 7:47 PM, Hahn Kim wrote:
>>>>>>>>
>>>>>>>>> Great, that worked, thanks! However, it still concerns me that
>>>>>>>>> the
>>>>>>>>> FAQ page says that mpirun will execute .profile which doesn't
>>>>>>>>> seem
>>>>>>>>> to work for me. Are there any configuration issues that could
>>>>>>>>> possibly be preventing mpirun from doing this? It would
>>>>>>>>> certainly
>>>>>>>>> be more convenient if I could maintain my environment in a
>>>>>>>>> single .profile file instead of adding what could potentially
>>>>>>>>> be a
>>>>>>>>> lot of -x arguments to my mpirun command.
>>>>>>>>>
>>>>>>>>> Hahn
>>>>>>>>>
>>>>>>>>> On Oct 6, 2008, at 5:44 PM, Aurélien Bouteiller wrote:
>>>>>>>>>
>>>>>>>>>> tYou can forward your local env with mpirun -x
>>>>>>>>>> LD_LIBRARY_PATH. As
>>>>>>>>>> an
>>>>>>>>>> alternative you can set specific values with mpirun -x
>>>>>>>>>> LD_LIBRARY_PATH=/some/where:/some/where/else . More information
>>>>>>>>>> with
>>>>>>>>>> mpirun --help (or man mpirun).
>>>>>>>>>>
>>>>>>>>>> Aurelien
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Le 6 oct. 08 à 16:06, Hahn Kim a écrit :
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I'm having difficulty launching an Open MPI job onto a machine
>>>>>>>>>>> that
>>>>>>>>>>> is running the Bourne shell.
>>>>>>>>>>>
>>>>>>>>>>> Here's my basic setup. I have two machines, one is an x86-
>>>>>>>>>>> based
>>>>>>>>>>> machine running bash and the other is a Cell-based machine
>>>>>>>>>>> running
>>>>>>>>>>> Bourne shell. I'm running mpirun from the x86 machine, which
>>>>>>>>>>> launches a C++ MPI application onto the Cell machine. I get
>>>>>>>>>>> the
>>>>>>>>>>> following error:
>>>>>>>>>>>
>>>>>>>>>>> error while loading shared libraries: libstdc++.so.6: cannot
>>>>>>>>>>> open
>>>>>>>>>>> shared object file: No such file or directory
>>>>>>>>>>>
>>>>>>>>>>> The basic problem is that LD_LIBRARY_PATH needs to be set to
>>>>>>>>>>> the
>>>>>>>>>>> directory that contains libstdc++.so.6 for the Cell. I set the
>>>>>>>>>>> following line in .profile:
>>>>>>>>>>>
>>>>>>>>>>> export LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
>>>>>>>>>>>
>>>>>>>>>>> which is the path to the PPC libraries for Cell.
>>>>>>>>>>>
>>>>>>>>>>> Now if I log directly into the Cell machine and run the program
>>>>>>>>>>> directly from the command line, I don't get the above error.
>>>>>>>>>>> But
>>>>>>>>>>> mpirun still fails, even after setting LD_LIBRARY_PATH
>>>>>>>>>>> in .profile.
>>>>>>>>>>>
>>>>>>>>>>> As a sanity check, I did the following. I ran the following
>>>>>>>>>>> command
>>>>>>>>>>> from the x86 machine:
>>>>>>>>>>>
>>>>>>>>>>> mpirun -np 1 --host cab0 env
>>>>>>>>>>>
>>>>>>>>>>> which, among others things, shows me the following value:
>>>>>>>>>>>
>>>>>>>>>>> LD_LIBRARY_PATH=/tools/openmpi-1.2.5/lib:
>>>>>>>>>>>
>>>>>>>>>>> If I log into the Cell machine and run env directly from the
>>>>>>>>>>> command
>>>>>>>>>>> line, I get the following value:
>>>>>>>>>>>
>>>>>>>>>>> LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
>>>>>>>>>>>
>>>>>>>>>>> So it appears that .profile gets sourced when I log in but not
>>>>>>>>>>> when
>>>>>>>>>>> mpirun runs.
>>>>>>>>>>>
>>>>>>>>>>> However, according to the OpenMPI FAQ
>>>>>>>>>>> (http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
>>>>>>>>>>> ), mpirun is supposed to directly call .profile since Bourne
>>>>>>>>>>> shell
>>>>>>>>>>> doesn't automatically call it for non-interactive shells.
>>>>>>>>>>>
>>>>>>>>>>> Does anyone have any insight as to why my environment isn't
>>>>>>>>>>> being
>>>>>>>>>>> set properly? Thanks!
>>>>>>>>>>>
>>>>>>>>>>> Hahn
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Hahn Kim, hgk_at_[hidden]
>>>>>>>>>>> MIT Lincoln Laboratory
>>>>>>>>>>> 244 Wood St., Lexington, MA 02420
>>>>>>>>>>> Tel: 781-981-0940, Fax: 781-981-5255
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> * Dr. Aurélien Bouteiller
>>>>>>>>>> * Sr. Research Associate at Innovative Computing Laboratory
>>>>>>>>>> * University of Tennessee
>>>>>>>>>> * 1122 Volunteer Boulevard, suite 350
>>>>>>>>>> * Knoxville, TN 37996
>>>>>>>>>> * 865 974 6321
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> users_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Hahn Kim
>>>>>>>>> MIT Lincoln Laboratory Phone: (781) 981-0940
>>>>>>>>> 244 Wood Street, S2-252 Fax: (781) 981-5255
>>>>>>>>> Lexington, MA 02420 E-mail: hgk_at_[hidden]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jeff Squyres
>>>>>>> Cisco Systems
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>> --
>>>>>> Hahn Kim, hgk_at_[hidden]
>>>>>> MIT Lincoln Laboratory
>>>>>> 244 Wood St., Lexington, MA 02420
>>>>>> Tel: 781-981-0940, Fax: 781-981-5255
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Squyres
>>>>> Cisco Systems
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> --
>>>> Hahn Kim, hgk_at_[hidden]
>>>> MIT Lincoln Laboratory
>>>> 244 Wood St., Lexington, MA 02420
>>>> Tel: 781-981-0940, Fax: 781-981-5255
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> --
>> Hahn Kim, hgk_at_[hidden]
>> MIT Lincoln Laboratory
>> 244 Wood St., Lexington, MA 02420
>> Tel: 781-981-0940, Fax: 781-981-5255
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users