Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem launching onto Bourne shell
From: Hahn Kim (hgk_at_[hidden])
Date: 2008-10-09 14:32:00


Hi,

Thanks for providing a fix, sorry for the delay in response. Once I
found out about -x, I've been busy working on the rest of our code, so
I haven't had the time to try out the fix. I'll take a look at it
soon as I can and will let you know how it works out.

Hahn

On Oct 7, 2008, at 5:41 PM, Jeff Squyres wrote:

> On Oct 7, 2008, at 4:19 PM, Hahn Kim wrote:
>
>>> you probably want to set the LD_LIBRARY_PATH (and PATH, likely, and
>>> possibly others, such as that LICENSE key, etc.) regardless of
>>> whether it's an interactive or non-interactive login.
>>
>> Right, that's exactly what I want to do. I was hoping that mpirun
>> would run .profile as the FAQ page stated, but the -x fix works for
>> now.
>
> If you're using Bash, it should be running .bashrc. But it looks like
> you did identify a bug that we're *not* running .profile. I have a
> Mercurial branch up with a fix if you want to give it a spin:
>
> http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/sh-profile-fixes/
>
>> I just realized that I'm using .bash_profile on the x86 and need to
>> move its contents into .bashrc and call .bashrc from .bash_profile,
>> since eventually I will also be launching MPI jobs onto other x86
>> processors.
>>
>> Thanks to everyone for their help.
>>
>> Hahn
>>
>> On Oct 7, 2008, at 2:16 PM, Jeff Squyres wrote:
>>
>>> On Oct 7, 2008, at 12:48 PM, Hahn Kim wrote:
>>>
>>>> Regarding 1., we're actually using 1.2.5. We started using Open
>>>> MPI
>>>> last winter and just stuck with it. For now, using the -x flag
>>>> with
>>>> mpirun works. If this really is a bug in 1.2.7, then I think we'll
>>>> stick with 1.2.5 for now, then upgrade later when it's fixed.
>>>
>>> It looks like this behavior has been the same throughout the entire
>>> 1.2 series.
>>>
>>>> Regarding 2., are you saying I should run the commands you suggest
>>>> from the x86 node running bash, so that ssh logs into the Cell node
>>>> running Bourne?
>>>
>>> I'm saying that if "ssh othernode env" gives different answers than
>>> "ssh othernode"/"env", then your .bashrc or .profile or whatever is
>>> dumping out early depending on whether you have an interactive login
>>> or not. This is the real cause of the error -- you probably want to
>>> set the LD_LIBRARY_PATH (and PATH, likely, and possibly others, such
>>> as that LICENSE key, etc.) regardless of whether it's an interactive
>>> or non-interactive login.
>>>
>>>>
>>>> When I run "ssh othernode env" from the x86 node, I get the
>>>> following vanilla environment:
>>>>
>>>> USER=ha17646
>>>> HOME=/home/ha17646
>>>> LOGNAME=ha17646
>>>> SHELL=/bin/sh
>>>> PWD=/home/ha17646
>>>>
>>>> When I run "ssh othernode" from the x86 node, then run "env" on the
>>>> Cell, I get the following:
>>>>
>>>> USER=ha17646
>>>> LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
>>>> HOME=/home/ha17646
>>>> MCS_LICENSE_PATH=/opt/MultiCorePlus/mcf.key
>>>> LOGNAME=ha17646
>>>> TERM=xterm-color
>>>> PATH=/usr/local/bin:/usr/bin:/sbin:/bin:/tools/openmpi-1.2.5/bin:/
>>>> tools/cmake-2.4.7/bin:/tools
>>>> SHELL=/bin/sh
>>>> PWD=/home/ha17646
>>>> TZ=EST5EDT
>>>>
>>>> Hahn
>>>>
>>>> On Oct 7, 2008, at 12:07 PM, Jeff Squyres wrote:
>>>>
>>>>> Ralph and I just talked about this a bit:
>>>>>
>>>>> 1. In all released versions of OMPI, we *do* source the .profile
>>>>> file
>>>>> on the target node if it exists (because vanilla Bourne shells do
>>>>> not
>>>>> source anything on remote nodes -- Bash does, though, per the
>>>>> FAQ).
>>>>> However, looking in 1.2.7, it looks like it might not be executing
>>>>> that code -- there *may* be a bug in this area. We're checking
>>>>> into it.
>>>>>
>>>>> 2. You might want to check your configuration to see if
>>>>> your .bashrc
>>>>> is dumping out early because it's a non-interactive shell. Check
>>>>> the
>>>>> output of:
>>>>>
>>>>> ssh othernode env
>>>>> vs.
>>>>> ssh othernode
>>>>> env
>>>>>
>>>>> (i.e., a non-interactive running of "env" vs. an interactive login
>>>>> and
>>>>> running "env")
>>>>>
>>>>>
>>>>>
>>>>> On Oct 7, 2008, at 8:53 AM, Ralph Castain wrote:
>>>>>
>>>>>> I am unaware of anything in the code that would "source .profile"
>>>>>> for you. I believe the FAQ page is in error here.
>>>>>>
>>>>>> Ralph
>>>>>>
>>>>>> On Oct 6, 2008, at 7:47 PM, Hahn Kim wrote:
>>>>>>
>>>>>>> Great, that worked, thanks! However, it still concerns me that
>>>>>>> the
>>>>>>> FAQ page says that mpirun will execute .profile which doesn't
>>>>>>> seem
>>>>>>> to work for me. Are there any configuration issues that could
>>>>>>> possibly be preventing mpirun from doing this? It would
>>>>>>> certainly
>>>>>>> be more convenient if I could maintain my environment in a
>>>>>>> single .profile file instead of adding what could potentially
>>>>>>> be a
>>>>>>> lot of -x arguments to my mpirun command.
>>>>>>>
>>>>>>> Hahn
>>>>>>>
>>>>>>> On Oct 6, 2008, at 5:44 PM, Aurélien Bouteiller wrote:
>>>>>>>
>>>>>>>> tYou can forward your local env with mpirun -x
>>>>>>>> LD_LIBRARY_PATH. As
>>>>>>>> an
>>>>>>>> alternative you can set specific values with mpirun -x
>>>>>>>> LD_LIBRARY_PATH=/some/where:/some/where/else . More information
>>>>>>>> with
>>>>>>>> mpirun --help (or man mpirun).
>>>>>>>>
>>>>>>>> Aurelien
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Le 6 oct. 08 à 16:06, Hahn Kim a écrit :
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm having difficulty launching an Open MPI job onto a machine
>>>>>>>>> that
>>>>>>>>> is running the Bourne shell.
>>>>>>>>>
>>>>>>>>> Here's my basic setup. I have two machines, one is an x86-
>>>>>>>>> based
>>>>>>>>> machine running bash and the other is a Cell-based machine
>>>>>>>>> running
>>>>>>>>> Bourne shell. I'm running mpirun from the x86 machine, which
>>>>>>>>> launches a C++ MPI application onto the Cell machine. I get
>>>>>>>>> the
>>>>>>>>> following error:
>>>>>>>>>
>>>>>>>>> error while loading shared libraries: libstdc++.so.6: cannot
>>>>>>>>> open
>>>>>>>>> shared object file: No such file or directory
>>>>>>>>>
>>>>>>>>> The basic problem is that LD_LIBRARY_PATH needs to be set to
>>>>>>>>> the
>>>>>>>>> directory that contains libstdc++.so.6 for the Cell. I set
>>>>>>>>> the
>>>>>>>>> following line in .profile:
>>>>>>>>>
>>>>>>>>> export LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/
>>>>>>>>> 4.1.1/32
>>>>>>>>>
>>>>>>>>> which is the path to the PPC libraries for Cell.
>>>>>>>>>
>>>>>>>>> Now if I log directly into the Cell machine and run the
>>>>>>>>> program
>>>>>>>>> directly from the command line, I don't get the above error.
>>>>>>>>> But
>>>>>>>>> mpirun still fails, even after setting LD_LIBRARY_PATH
>>>>>>>>> in .profile.
>>>>>>>>>
>>>>>>>>> As a sanity check, I did the following. I ran the following
>>>>>>>>> command
>>>>>>>>> from the x86 machine:
>>>>>>>>>
>>>>>>>>> mpirun -np 1 --host cab0 env
>>>>>>>>>
>>>>>>>>> which, among others things, shows me the following value:
>>>>>>>>>
>>>>>>>>> LD_LIBRARY_PATH=/tools/openmpi-1.2.5/lib:
>>>>>>>>>
>>>>>>>>> If I log into the Cell machine and run env directly from the
>>>>>>>>> command
>>>>>>>>> line, I get the following value:
>>>>>>>>>
>>>>>>>>> LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
>>>>>>>>>
>>>>>>>>> So it appears that .profile gets sourced when I log in but not
>>>>>>>>> when
>>>>>>>>> mpirun runs.
>>>>>>>>>
>>>>>>>>> However, according to the OpenMPI FAQ (http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
>>>>>>>>> ), mpirun is supposed to directly call .profile since Bourne
>>>>>>>>> shell
>>>>>>>>> doesn't automatically call it for non-interactive shells.
>>>>>>>>>
>>>>>>>>> Does anyone have any insight as to why my environment isn't
>>>>>>>>> being
>>>>>>>>> set properly? Thanks!
>>>>>>>>>
>>>>>>>>> Hahn
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Hahn Kim, hgk_at_[hidden]
>>>>>>>>> MIT Lincoln Laboratory
>>>>>>>>> 244 Wood St., Lexington, MA 02420
>>>>>>>>> Tel: 781-981-0940, Fax: 781-981-5255
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> * Dr. Aurélien Bouteiller
>>>>>>>> * Sr. Research Associate at Innovative Computing Laboratory
>>>>>>>> * University of Tennessee
>>>>>>>> * 1122 Volunteer Boulevard, suite 350
>>>>>>>> * Knoxville, TN 37996
>>>>>>>> * 865 974 6321
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Hahn Kim
>>>>>>> MIT Lincoln Laboratory Phone: (781) 981-0940
>>>>>>> 244 Wood Street, S2-252 Fax: (781) 981-5255
>>>>>>> Lexington, MA 02420 E-mail: hgk_at_[hidden]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Squyres
>>>>> Cisco Systems
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> --
>>>> Hahn Kim, hgk_at_[hidden]
>>>> MIT Lincoln Laboratory
>>>> 244 Wood St., Lexington, MA 02420
>>>> Tel: 781-981-0940, Fax: 781-981-5255
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> --
>> Hahn Kim, hgk_at_[hidden]
>> MIT Lincoln Laboratory
>> 244 Wood St., Lexington, MA 02420
>> Tel: 781-981-0940, Fax: 781-981-5255
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Hahn Kim, hgk_at_[hidden]
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255