Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem launching onto Bourne shell
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-10-17 12:05:16


Doh; yes we did. This was a minor glitch in porting the 1.2 series
fix to the trunk/v1.3 (i.e., the fix in v1.2.8 is ok -- whew!).

Fixed on the trunk in r19758; thanks for noticing. I'll file a CMR
for v1.3.

On Oct 16, 2008, at 7:05 PM, Mostyn Lewis wrote:

> Jeff,
>
> You broke my ksh (and I expect something else)
> Today's SVN 1.4a1r19757
> orte/mca/plm/rsh/plm_rsh_module.c
> line 471:
> tmp = opal_argv_split("( test ! -r ./.profile
> || . ./.profile;", ' ');
> ^
> ARGHH
> No (
> tmp = opal_argv_split(" test ! -r ./.profile
> || . ./.profile;", ' ');
> and all is well again :)
>
> Regards,
> Mostyn
>
> On Thu, 9 Oct 2008, Jeff Squyres wrote:
>
>> FWIW, the fix has been pushed into the trunk, 1.2.8, and 1.3 SVN
>> branches. So I'll probably take down the hg tree (we use those as
>> temporary branches).
>>
>> On Oct 9, 2008, at 2:32 PM, Hahn Kim wrote:
>>
>>> Hi,
>>> Thanks for providing a fix, sorry for the delay in response. Once
>>> I found out about -x, I've been busy working on the rest of our
>>> code, so I haven't had the time to try out the fix. I'll take a
>>> look at it soon as I can and will let you know how it works out.
>>> Hahn
>>> On Oct 7, 2008, at 5:41 PM, Jeff Squyres wrote:
>>>> On Oct 7, 2008, at 4:19 PM, Hahn Kim wrote:
>>>>>> you probably want to set the LD_LIBRARY_PATH (and PATH, likely,
>>>>>> and
>>>>>> possibly others, such as that LICENSE key, etc.) regardless of
>>>>>> whether it's an interactive or non-interactive login.
>>>>> Right, that's exactly what I want to do. I was hoping that mpirun
>>>>> would run .profile as the FAQ page stated, but the -x fix works
>>>>> for
>>>>> now.
>>>> If you're using Bash, it should be running .bashrc. But it looks
>>>> like
>>>> you did identify a bug that we're *not* running .profile. I have a
>>>> Mercurial branch up with a fix if you want to give it a spin:
>>>>
>>>> http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/sh-profile-fixes/
>>>>> I just realized that I'm using .bash_profile on the x86 and need
>>>>> to
>>>>> move its contents into .bashrc and call .bashrc
>>>>> from .bash_profile,
>>>>> since eventually I will also be launching MPI jobs onto other x86
>>>>> processors.
>>>>> Thanks to everyone for their help.
>>>>> Hahn
>>>>> On Oct 7, 2008, at 2:16 PM, Jeff Squyres wrote:
>>>>>> On Oct 7, 2008, at 12:48 PM, Hahn Kim wrote:
>>>>>>> Regarding 1., we're actually using 1.2.5. We started using
>>>>>>> Open MPI
>>>>>>> last winter and just stuck with it. For now, using the -x
>>>>>>> flag with
>>>>>>> mpirun works. If this really is a bug in 1.2.7, then I think
>>>>>>> we'll
>>>>>>> stick with 1.2.5 for now, then upgrade later when it's fixed.
>>>>>> It looks like this behavior has been the same throughout the
>>>>>> entire
>>>>>> 1.2 series.
>>>>>>> Regarding 2., are you saying I should run the commands you
>>>>>>> suggest
>>>>>>> from the x86 node running bash, so that ssh logs into the Cell
>>>>>>> node
>>>>>>> running Bourne?
>>>>>> I'm saying that if "ssh othernode env" gives different answers
>>>>>> than
>>>>>> "ssh othernode"/"env", then your .bashrc or .profile or
>>>>>> whatever is
>>>>>> dumping out early depending on whether you have an interactive
>>>>>> login
>>>>>> or not. This is the real cause of the error -- you probably
>>>>>> want to
>>>>>> set the LD_LIBRARY_PATH (and PATH, likely, and possibly others,
>>>>>> such
>>>>>> as that LICENSE key, etc.) regardless of whether it's an
>>>>>> interactive
>>>>>> or non-interactive login.
>>>>>>> When I run "ssh othernode env" from the x86 node, I get the
>>>>>>> following vanilla environment:
>>>>>>> USER=ha17646
>>>>>>> HOME=/home/ha17646
>>>>>>> LOGNAME=ha17646
>>>>>>> SHELL=/bin/sh
>>>>>>> PWD=/home/ha17646
>>>>>>> When I run "ssh othernode" from the x86 node, then run "env"
>>>>>>> on the
>>>>>>> Cell, I get the following:
>>>>>>> USER=ha17646
>>>>>>> LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
>>>>>>> HOME=/home/ha17646
>>>>>>> MCS_LICENSE_PATH=/opt/MultiCorePlus/mcf.key
>>>>>>> LOGNAME=ha17646
>>>>>>> TERM=xterm-color
>>>>>>> PATH=/usr/local/bin:/usr/bin:/sbin:/bin:/tools/openmpi-1.2.5/
>>>>>>> bin:/
>>>>>>> tools/cmake-2.4.7/bin:/tools
>>>>>>> SHELL=/bin/sh
>>>>>>> PWD=/home/ha17646
>>>>>>> TZ=EST5EDT
>>>>>>> Hahn
>>>>>>> On Oct 7, 2008, at 12:07 PM, Jeff Squyres wrote:
>>>>>>>> Ralph and I just talked about this a bit:
>>>>>>>> 1. In all released versions of OMPI, we *do* source
>>>>>>>> the .profile
>>>>>>>> file
>>>>>>>> on the target node if it exists (because vanilla Bourne
>>>>>>>> shells do
>>>>>>>> not
>>>>>>>> source anything on remote nodes -- Bash does, though, per the
>>>>>>>> FAQ).
>>>>>>>> However, looking in 1.2.7, it looks like it might not be
>>>>>>>> executing
>>>>>>>> that code -- there *may* be a bug in this area. We're checking
>>>>>>>> into it.
>>>>>>>> 2. You might want to check your configuration to see if
>>>>>>>> your .bashrc
>>>>>>>> is dumping out early because it's a non-interactive shell.
>>>>>>>> Check
>>>>>>>> the
>>>>>>>> output of:
>>>>>>>> ssh othernode env
>>>>>>>> vs.
>>>>>>>> ssh othernode
>>>>>>>> env
>>>>>>>> (i.e., a non-interactive running of "env" vs. an interactive
>>>>>>>> login
>>>>>>>> and
>>>>>>>> running "env")
>>>>>>>> On Oct 7, 2008, at 8:53 AM, Ralph Castain wrote:
>>>>>>>>> I am unaware of anything in the code that would
>>>>>>>>> "source .profile"
>>>>>>>>> for you. I believe the FAQ page is in error here.
>>>>>>>>> Ralph
>>>>>>>>> On Oct 6, 2008, at 7:47 PM, Hahn Kim wrote:
>>>>>>>>>> Great, that worked, thanks! However, it still concerns me
>>>>>>>>>> that
>>>>>>>>>> the
>>>>>>>>>> FAQ page says that mpirun will execute .profile which doesn't
>>>>>>>>>> seem
>>>>>>>>>> to work for me. Are there any configuration issues that
>>>>>>>>>> could
>>>>>>>>>> possibly be preventing mpirun from doing this? It would
>>>>>>>>>> certainly
>>>>>>>>>> be more convenient if I could maintain my environment in a
>>>>>>>>>> single .profile file instead of adding what could potentially
>>>>>>>>>> be a
>>>>>>>>>> lot of -x arguments to my mpirun command.
>>>>>>>>>> Hahn
>>>>>>>>>> On Oct 6, 2008, at 5:44 PM, Aurélien Bouteiller wrote:
>>>>>>>>>>> tYou can forward your local env with mpirun -x
>>>>>>>>>>> LD_LIBRARY_PATH. As
>>>>>>>>>>> an
>>>>>>>>>>> alternative you can set specific values with mpirun -x
>>>>>>>>>>> LD_LIBRARY_PATH=/some/where:/some/where/else . More
>>>>>>>>>>> information
>>>>>>>>>>> with
>>>>>>>>>>> mpirun --help (or man mpirun).
>>>>>>>>>>> Aurelien
>>>>>>>>>>> Le 6 oct. 08 à 16:06, Hahn Kim a écrit :
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> I'm having difficulty launching an Open MPI job onto a
>>>>>>>>>>>> machine
>>>>>>>>>>>> that
>>>>>>>>>>>> is running the Bourne shell.
>>>>>>>>>>>> Here's my basic setup. I have two machines, one is an x86-
>>>>>>>>>>>> based
>>>>>>>>>>>> machine running bash and the other is a Cell-based machine
>>>>>>>>>>>> running
>>>>>>>>>>>> Bourne shell. I'm running mpirun from the x86 machine,
>>>>>>>>>>>> which
>>>>>>>>>>>> launches a C++ MPI application onto the Cell machine. I
>>>>>>>>>>>> get
>>>>>>>>>>>> the
>>>>>>>>>>>> following error:
>>>>>>>>>>>> error while loading shared libraries: libstdc++.so.6:
>>>>>>>>>>>> cannot
>>>>>>>>>>>> open
>>>>>>>>>>>> shared object file: No such file or directory
>>>>>>>>>>>> The basic problem is that LD_LIBRARY_PATH needs to be set
>>>>>>>>>>>> to
>>>>>>>>>>>> the
>>>>>>>>>>>> directory that contains libstdc++.so.6 for the Cell. I
>>>>>>>>>>>> set the
>>>>>>>>>>>> following line in .profile:
>>>>>>>>>>>> export LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/
>>>>>>>>>>>> 4.1.1/32
>>>>>>>>>>>> which is the path to the PPC libraries for Cell.
>>>>>>>>>>>> Now if I log directly into the Cell machine and run the
>>>>>>>>>>>> program
>>>>>>>>>>>> directly from the command line, I don't get the above
>>>>>>>>>>>> error.
>>>>>>>>>>>> But
>>>>>>>>>>>> mpirun still fails, even after setting LD_LIBRARY_PATH
>>>>>>>>>>>> in .profile.
>>>>>>>>>>>> As a sanity check, I did the following. I ran the
>>>>>>>>>>>> following
>>>>>>>>>>>> command
>>>>>>>>>>>> from the x86 machine:
>>>>>>>>>>>> mpirun -np 1 --host cab0 env
>>>>>>>>>>>> which, among others things, shows me the following value:
>>>>>>>>>>>> LD_LIBRARY_PATH=/tools/openmpi-1.2.5/lib:
>>>>>>>>>>>> If I log into the Cell machine and run env directly from
>>>>>>>>>>>> the
>>>>>>>>>>>> command
>>>>>>>>>>>> line, I get the following value:
>>>>>>>>>>>> LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
>>>>>>>>>>>> So it appears that .profile gets sourced when I log in
>>>>>>>>>>>> but not
>>>>>>>>>>>> when
>>>>>>>>>>>> mpirun runs.
>>>>>>>>>>>> However, according to the OpenMPI FAQ (http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
>>>>>>>>>>>> ), mpirun is supposed to directly call .profile since
>>>>>>>>>>>> Bourne
>>>>>>>>>>>> shell
>>>>>>>>>>>> doesn't automatically call it for non-interactive shells.
>>>>>>>>>>>> Does anyone have any insight as to why my environment isn't
>>>>>>>>>>>> being
>>>>>>>>>>>> set properly? Thanks!
>>>>>>>>>>>> Hahn
>>>>>>>>>>>> --
>>>>>>>>>>>> Hahn Kim, hgk_at_[hidden]
>>>>>>>>>>>> MIT Lincoln Laboratory
>>>>>>>>>>>> 244 Wood St., Lexington, MA 02420
>>>>>>>>>>>> Tel: 781-981-0940, Fax: 781-981-5255
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>> --
>>>>>>>>>>> * Dr. Aurélien Bouteiller
>>>>>>>>>>> * Sr. Research Associate at Innovative Computing Laboratory
>>>>>>>>>>> * University of Tennessee
>>>>>>>>>>> * 1122 Volunteer Boulevard, suite 350
>>>>>>>>>>> * Knoxville, TN 37996
>>>>>>>>>>> * 865 974 6321
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> --
>>>>>>>>>> Hahn Kim
>>>>>>>>>> MIT Lincoln Laboratory Phone: (781) 981-0940
>>>>>>>>>> 244 Wood Street, S2-252 Fax: (781) 981-5255
>>>>>>>>>> Lexington, MA 02420 E-mail: hgk_at_[hidden]
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> users_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> --
>>>>>>>> Jeff Squyres
>>>>>>>> Cisco Systems
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> --
>>>>>>> Hahn Kim, hgk_at_[hidden]
>>>>>>> MIT Lincoln Laboratory
>>>>>>> 244 Wood St., Lexington, MA 02420
>>>>>>> Tel: 781-981-0940, Fax: 781-981-5255
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> --
>>>>>> Jeff Squyres
>>>>>> Cisco Systems
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> --
>>>>> Hahn Kim, hgk_at_[hidden]
>>>>> MIT Lincoln Laboratory
>>>>> 244 Wood St., Lexington, MA 02420
>>>>> Tel: 781-981-0940, Fax: 781-981-5255
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> --
>>> Hahn Kim, hgk_at_[hidden]
>>> MIT Lincoln Laboratory
>>> 244 Wood St., Lexington, MA 02420
>>> Tel: 781-981-0940, Fax: 781-981-5255
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems