Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem launching onto Bourne shell
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-10-07 17:41:51


On Oct 7, 2008, at 4:19 PM, Hahn Kim wrote:

>> you probably want to set the LD_LIBRARY_PATH (and PATH, likely, and
>> possibly others, such as that LICENSE key, etc.) regardless of
>> whether it's an interactive or non-interactive login.
>
> Right, that's exactly what I want to do. I was hoping that mpirun
> would run .profile as the FAQ page stated, but the -x fix works for
> now.

If you're using Bash, it should be running .bashrc. But it looks like
you did identify a bug that we're *not* running .profile. I have a
Mercurial branch up with a fix if you want to give it a spin:

     http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/sh-profile-fixes/

> I just realized that I'm using .bash_profile on the x86 and need to
> move its contents into .bashrc and call .bashrc from .bash_profile,
> since eventually I will also be launching MPI jobs onto other x86
> processors.
>
> Thanks to everyone for their help.
>
> Hahn
>
> On Oct 7, 2008, at 2:16 PM, Jeff Squyres wrote:
>
>> On Oct 7, 2008, at 12:48 PM, Hahn Kim wrote:
>>
>>> Regarding 1., we're actually using 1.2.5. We started using Open MPI
>>> last winter and just stuck with it. For now, using the -x flag with
>>> mpirun works. If this really is a bug in 1.2.7, then I think we'll
>>> stick with 1.2.5 for now, then upgrade later when it's fixed.
>>
>> It looks like this behavior has been the same throughout the entire
>> 1.2 series.
>>
>>> Regarding 2., are you saying I should run the commands you suggest
>>> from the x86 node running bash, so that ssh logs into the Cell node
>>> running Bourne?
>>
>> I'm saying that if "ssh othernode env" gives different answers than
>> "ssh othernode"/"env", then your .bashrc or .profile or whatever is
>> dumping out early depending on whether you have an interactive login
>> or not. This is the real cause of the error -- you probably want to
>> set the LD_LIBRARY_PATH (and PATH, likely, and possibly others, such
>> as that LICENSE key, etc.) regardless of whether it's an interactive
>> or non-interactive login.
>>
>>>
>>> When I run "ssh othernode env" from the x86 node, I get the
>>> following vanilla environment:
>>>
>>> USER=ha17646
>>> HOME=/home/ha17646
>>> LOGNAME=ha17646
>>> SHELL=/bin/sh
>>> PWD=/home/ha17646
>>>
>>> When I run "ssh othernode" from the x86 node, then run "env" on the
>>> Cell, I get the following:
>>>
>>> USER=ha17646
>>> LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
>>> HOME=/home/ha17646
>>> MCS_LICENSE_PATH=/opt/MultiCorePlus/mcf.key
>>> LOGNAME=ha17646
>>> TERM=xterm-color
>>> PATH=/usr/local/bin:/usr/bin:/sbin:/bin:/tools/openmpi-1.2.5/bin:/
>>> tools/cmake-2.4.7/bin:/tools
>>> SHELL=/bin/sh
>>> PWD=/home/ha17646
>>> TZ=EST5EDT
>>>
>>> Hahn
>>>
>>> On Oct 7, 2008, at 12:07 PM, Jeff Squyres wrote:
>>>
>>>> Ralph and I just talked about this a bit:
>>>>
>>>> 1. In all released versions of OMPI, we *do* source the .profile
>>>> file
>>>> on the target node if it exists (because vanilla Bourne shells do
>>>> not
>>>> source anything on remote nodes -- Bash does, though, per the FAQ).
>>>> However, looking in 1.2.7, it looks like it might not be executing
>>>> that code -- there *may* be a bug in this area. We're checking
>>>> into it.
>>>>
>>>> 2. You might want to check your configuration to see if
>>>> your .bashrc
>>>> is dumping out early because it's a non-interactive shell. Check
>>>> the
>>>> output of:
>>>>
>>>> ssh othernode env
>>>> vs.
>>>> ssh othernode
>>>> env
>>>>
>>>> (i.e., a non-interactive running of "env" vs. an interactive login
>>>> and
>>>> running "env")
>>>>
>>>>
>>>>
>>>> On Oct 7, 2008, at 8:53 AM, Ralph Castain wrote:
>>>>
>>>>> I am unaware of anything in the code that would "source .profile"
>>>>> for you. I believe the FAQ page is in error here.
>>>>>
>>>>> Ralph
>>>>>
>>>>> On Oct 6, 2008, at 7:47 PM, Hahn Kim wrote:
>>>>>
>>>>>> Great, that worked, thanks! However, it still concerns me that
>>>>>> the
>>>>>> FAQ page says that mpirun will execute .profile which doesn't
>>>>>> seem
>>>>>> to work for me. Are there any configuration issues that could
>>>>>> possibly be preventing mpirun from doing this? It would
>>>>>> certainly
>>>>>> be more convenient if I could maintain my environment in a
>>>>>> single .profile file instead of adding what could potentially
>>>>>> be a
>>>>>> lot of -x arguments to my mpirun command.
>>>>>>
>>>>>> Hahn
>>>>>>
>>>>>> On Oct 6, 2008, at 5:44 PM, Aurélien Bouteiller wrote:
>>>>>>
>>>>>>> tYou can forward your local env with mpirun -x
>>>>>>> LD_LIBRARY_PATH. As
>>>>>>> an
>>>>>>> alternative you can set specific values with mpirun -x
>>>>>>> LD_LIBRARY_PATH=/some/where:/some/where/else . More information
>>>>>>> with
>>>>>>> mpirun --help (or man mpirun).
>>>>>>>
>>>>>>> Aurelien
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Le 6 oct. 08 à 16:06, Hahn Kim a écrit :
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm having difficulty launching an Open MPI job onto a machine
>>>>>>>> that
>>>>>>>> is running the Bourne shell.
>>>>>>>>
>>>>>>>> Here's my basic setup. I have two machines, one is an x86-
>>>>>>>> based
>>>>>>>> machine running bash and the other is a Cell-based machine
>>>>>>>> running
>>>>>>>> Bourne shell. I'm running mpirun from the x86 machine, which
>>>>>>>> launches a C++ MPI application onto the Cell machine. I get
>>>>>>>> the
>>>>>>>> following error:
>>>>>>>>
>>>>>>>> error while loading shared libraries: libstdc++.so.6: cannot
>>>>>>>> open
>>>>>>>> shared object file: No such file or directory
>>>>>>>>
>>>>>>>> The basic problem is that LD_LIBRARY_PATH needs to be set to
>>>>>>>> the
>>>>>>>> directory that contains libstdc++.so.6 for the Cell. I set the
>>>>>>>> following line in .profile:
>>>>>>>>
>>>>>>>> export LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
>>>>>>>>
>>>>>>>> which is the path to the PPC libraries for Cell.
>>>>>>>>
>>>>>>>> Now if I log directly into the Cell machine and run the program
>>>>>>>> directly from the command line, I don't get the above error.
>>>>>>>> But
>>>>>>>> mpirun still fails, even after setting LD_LIBRARY_PATH
>>>>>>>> in .profile.
>>>>>>>>
>>>>>>>> As a sanity check, I did the following. I ran the following
>>>>>>>> command
>>>>>>>> from the x86 machine:
>>>>>>>>
>>>>>>>> mpirun -np 1 --host cab0 env
>>>>>>>>
>>>>>>>> which, among others things, shows me the following value:
>>>>>>>>
>>>>>>>> LD_LIBRARY_PATH=/tools/openmpi-1.2.5/lib:
>>>>>>>>
>>>>>>>> If I log into the Cell machine and run env directly from the
>>>>>>>> command
>>>>>>>> line, I get the following value:
>>>>>>>>
>>>>>>>> LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
>>>>>>>>
>>>>>>>> So it appears that .profile gets sourced when I log in but not
>>>>>>>> when
>>>>>>>> mpirun runs.
>>>>>>>>
>>>>>>>> However, according to the OpenMPI FAQ (http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
>>>>>>>> ), mpirun is supposed to directly call .profile since Bourne
>>>>>>>> shell
>>>>>>>> doesn't automatically call it for non-interactive shells.
>>>>>>>>
>>>>>>>> Does anyone have any insight as to why my environment isn't
>>>>>>>> being
>>>>>>>> set properly? Thanks!
>>>>>>>>
>>>>>>>> Hahn
>>>>>>>>
>>>>>>>> --
>>>>>>>> Hahn Kim, hgk_at_[hidden]
>>>>>>>> MIT Lincoln Laboratory
>>>>>>>> 244 Wood St., Lexington, MA 02420
>>>>>>>> Tel: 781-981-0940, Fax: 781-981-5255
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> * Dr. Aurélien Bouteiller
>>>>>>> * Sr. Research Associate at Innovative Computing Laboratory
>>>>>>> * University of Tennessee
>>>>>>> * 1122 Volunteer Boulevard, suite 350
>>>>>>> * Knoxville, TN 37996
>>>>>>> * 865 974 6321
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Hahn Kim
>>>>>> MIT Lincoln Laboratory Phone: (781) 981-0940
>>>>>> 244 Wood Street, S2-252 Fax: (781) 981-5255
>>>>>> Lexington, MA 02420 E-mail: hgk_at_[hidden]
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> --
>>> Hahn Kim, hgk_at_[hidden]
>>> MIT Lincoln Laboratory
>>> 244 Wood St., Lexington, MA 02420
>>> Tel: 781-981-0940, Fax: 781-981-5255
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
> Hahn Kim, hgk_at_[hidden]
> MIT Lincoln Laboratory
> 244 Wood St., Lexington, MA 02420
> Tel: 781-981-0940, Fax: 781-981-5255
>
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems