Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem launching onto Bourne shell
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-10-07 14:16:02


On Oct 7, 2008, at 12:48 PM, Hahn Kim wrote:

> Regarding 1., we're actually using 1.2.5. We started using Open MPI
> last winter and just stuck with it. For now, using the -x flag with
> mpirun works. If this really is a bug in 1.2.7, then I think we'll
> stick with 1.2.5 for now, then upgrade later when it's fixed.

It looks like this behavior has been the same throughout the entire
1.2 series.

> Regarding 2., are you saying I should run the commands you suggest
> from the x86 node running bash, so that ssh logs into the Cell node
> running Bourne?

I'm saying that if "ssh othernode env" gives different answers than
"ssh othernode"/"env", then your .bashrc or .profile or whatever is
dumping out early depending on whether you have an interactive login
or not. This is the real cause of the error -- you probably want to
set the LD_LIBRARY_PATH (and PATH, likely, and possibly others, such
as that LICENSE key, etc.) regardless of whether it's an interactive
or non-interactive login.

>
> When I run "ssh othernode env" from the x86 node, I get the
> following vanilla environment:
>
> USER=ha17646
> HOME=/home/ha17646
> LOGNAME=ha17646
> SHELL=/bin/sh
> PWD=/home/ha17646
>
> When I run "ssh othernode" from the x86 node, then run "env" on the
> Cell, I get the following:
>
> USER=ha17646
> LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
> HOME=/home/ha17646
> MCS_LICENSE_PATH=/opt/MultiCorePlus/mcf.key
> LOGNAME=ha17646
> TERM=xterm-color
> PATH=/usr/local/bin:/usr/bin:/sbin:/bin:/tools/openmpi-1.2.5/bin:/
> tools/cmake-2.4.7/bin:/tools
> SHELL=/bin/sh
> PWD=/home/ha17646
> TZ=EST5EDT
>
> Hahn
>
> On Oct 7, 2008, at 12:07 PM, Jeff Squyres wrote:
>
>> Ralph and I just talked about this a bit:
>>
>> 1. In all released versions of OMPI, we *do* source the .profile file
>> on the target node if it exists (because vanilla Bourne shells do not
>> source anything on remote nodes -- Bash does, though, per the FAQ).
>> However, looking in 1.2.7, it looks like it might not be executing
>> that code -- there *may* be a bug in this area. We're checking
>> into it.
>>
>> 2. You might want to check your configuration to see if your .bashrc
>> is dumping out early because it's a non-interactive shell. Check the
>> output of:
>>
>> ssh othernode env
>> vs.
>> ssh othernode
>> env
>>
>> (i.e., a non-interactive running of "env" vs. an interactive login
>> and
>> running "env")
>>
>>
>>
>> On Oct 7, 2008, at 8:53 AM, Ralph Castain wrote:
>>
>>> I am unaware of anything in the code that would "source .profile"
>>> for you. I believe the FAQ page is in error here.
>>>
>>> Ralph
>>>
>>> On Oct 6, 2008, at 7:47 PM, Hahn Kim wrote:
>>>
>>>> Great, that worked, thanks! However, it still concerns me that the
>>>> FAQ page says that mpirun will execute .profile which doesn't seem
>>>> to work for me. Are there any configuration issues that could
>>>> possibly be preventing mpirun from doing this? It would certainly
>>>> be more convenient if I could maintain my environment in a
>>>> single .profile file instead of adding what could potentially be a
>>>> lot of -x arguments to my mpirun command.
>>>>
>>>> Hahn
>>>>
>>>> On Oct 6, 2008, at 5:44 PM, Aurélien Bouteiller wrote:
>>>>
>>>>> tYou can forward your local env with mpirun -x LD_LIBRARY_PATH. As
>>>>> an
>>>>> alternative you can set specific values with mpirun -x
>>>>> LD_LIBRARY_PATH=/some/where:/some/where/else . More information
>>>>> with
>>>>> mpirun --help (or man mpirun).
>>>>>
>>>>> Aurelien
>>>>>
>>>>>
>>>>>
>>>>> Le 6 oct. 08 à 16:06, Hahn Kim a écrit :
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm having difficulty launching an Open MPI job onto a machine
>>>>>> that
>>>>>> is running the Bourne shell.
>>>>>>
>>>>>> Here's my basic setup. I have two machines, one is an x86-based
>>>>>> machine running bash and the other is a Cell-based machine
>>>>>> running
>>>>>> Bourne shell. I'm running mpirun from the x86 machine, which
>>>>>> launches a C++ MPI application onto the Cell machine. I get the
>>>>>> following error:
>>>>>>
>>>>>> error while loading shared libraries: libstdc++.so.6: cannot open
>>>>>> shared object file: No such file or directory
>>>>>>
>>>>>> The basic problem is that LD_LIBRARY_PATH needs to be set to the
>>>>>> directory that contains libstdc++.so.6 for the Cell. I set the
>>>>>> following line in .profile:
>>>>>>
>>>>>> export LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
>>>>>>
>>>>>> which is the path to the PPC libraries for Cell.
>>>>>>
>>>>>> Now if I log directly into the Cell machine and run the program
>>>>>> directly from the command line, I don't get the above error. But
>>>>>> mpirun still fails, even after setting LD_LIBRARY_PATH
>>>>>> in .profile.
>>>>>>
>>>>>> As a sanity check, I did the following. I ran the following
>>>>>> command
>>>>>> from the x86 machine:
>>>>>>
>>>>>> mpirun -np 1 --host cab0 env
>>>>>>
>>>>>> which, among others things, shows me the following value:
>>>>>>
>>>>>> LD_LIBRARY_PATH=/tools/openmpi-1.2.5/lib:
>>>>>>
>>>>>> If I log into the Cell machine and run env directly from the
>>>>>> command
>>>>>> line, I get the following value:
>>>>>>
>>>>>> LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
>>>>>>
>>>>>> So it appears that .profile gets sourced when I log in but not
>>>>>> when
>>>>>> mpirun runs.
>>>>>>
>>>>>> However, according to the OpenMPI FAQ (http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
>>>>>> ), mpirun is supposed to directly call .profile since Bourne
>>>>>> shell
>>>>>> doesn't automatically call it for non-interactive shells.
>>>>>>
>>>>>> Does anyone have any insight as to why my environment isn't being
>>>>>> set properly? Thanks!
>>>>>>
>>>>>> Hahn
>>>>>>
>>>>>> --
>>>>>> Hahn Kim, hgk_at_[hidden]
>>>>>> MIT Lincoln Laboratory
>>>>>> 244 Wood St., Lexington, MA 02420
>>>>>> Tel: 781-981-0940, Fax: 781-981-5255
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> * Dr. Aurélien Bouteiller
>>>>> * Sr. Research Associate at Innovative Computing Laboratory
>>>>> * University of Tennessee
>>>>> * 1122 Volunteer Boulevard, suite 350
>>>>> * Knoxville, TN 37996
>>>>> * 865 974 6321
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>>> --
>>>> Hahn Kim
>>>> MIT Lincoln Laboratory Phone: (781) 981-0940
>>>> 244 Wood Street, S2-252 Fax: (781) 981-5255
>>>> Lexington, MA 02420 E-mail: hgk_at_[hidden]
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
> Hahn Kim, hgk_at_[hidden]
> MIT Lincoln Laboratory
> 244 Wood St., Lexington, MA 02420
> Tel: 781-981-0940, Fax: 781-981-5255
>
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems