Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Tim Prins (tprins_at_[hidden])
Date: 2007-07-09 14:58:20


On Monday 09 July 2007 12:52:29 pm jody wrote:
> Tim,
> thanks for your suggestions.
> There seems to be something wrong with the PATH:
> jody_at_aim-nano_02 ~/progs $ ssh 130.60.49.128 printenv | grep PATH
> PATH=/usr/bin:/bin:/usr/sbin:/sbin
>
> which i don't understand. Logging via ssh into 130.60.49.128 i get:
>
> jody_at_aim-nano_02 ~/progs $ ssh 130.60.49.128
> Last login: Mon Jul 9 18:26:11 2007 from 130.60.49.129
> jody_at_aim-nano_00 ~ $ cat .bash_profile
> # /etc/skel/.bash_profile
>
> # This file is sourced by bash for login shells. The following line
> # runs your .bashrc and is recommended by the bash info pages.
> [[ -f ~/.bashrc ]] && . ~/.bashrc
>
> PATH=/opt/openmpi/bin:$PATH
> export PATH
> LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH
> export LD_LIBRARY_PATH
>
>
> jody_at_aim-nano_00 ~ $ echo $PATH
> /opt/openmpi/bin:/opt/openmpi/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/us
>r/i686-pc-linux-gnu/gcc-bin/3.4.5:/opt/sun-
> jdk-1.4.2.10/bin:/opt/sun-jdk-1.4.2.10/jre/bin:/opt/sun-jdk-1.4.2.10
> /jre/javaws:/usr/qt/3/bin
>
> (aim-nano_00 is the name of 130.60.49.128)
> So why is the path set when i ssh by hand,
> but not otherwise?
You must set the path in .bashrc. See
http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path

Make sure:
ssh 130.60.49.128 which orted
works. If it doesn't, there is something wrong with the PATH.

>
> The suggestion with the --prefix option also didn't work:
> jody_at_aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --prefix /opt/openmpi
> --hostfile hostfile ./a.out
> [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file
> dss/dss_peek.c at line 59
> [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file
> dss/dss_peek.c at line 59
> [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file
> dss/dss_peek.c at line 59
> [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file
> dss/dss_peek.c at line 59
> [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file
> dss/dss_peek.c at line 59
> [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file
> dss/dss_peek.c at line 59
Often this means that there is a version mismatch. Do all the nodes have the
same version of Open MPI installed? Did you compile your application with
this version of Open MPI?

Tim

>
> (after which the thing seems to hang....)
>
> If i use the aim-nano_02 (130.60.49.130) instead of a hostfile,
> jody_at_aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --prefix /opt/openmpi
> --host 130.60.49.130 ./a.out
> it works, as it does if i run it on the machine itself the standard way
> jody_at_aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --host
> 130.60.49.130./a.out
>
> Is there anything else i could try?
>
> Jody
>
> On 7/9/07, Tim Prins <tprins_at_[hidden]> wrote:
> > jody wrote:
> > > Hi Tim
> > > (I accidentally sent the previous message before it was ready - here's
> > > the complete one)
> > > Thank You for your reply.
> > > Unfortunately my workstation, on which i could successfully run openmpi
> > > applications, has died. But one my replacement machine (which
> > > i assume i have setup in an equivalent way) i now get errors even when
> > > i
>
> try
>
> > > to run an openmpi application in a simple way:
> > >
> > > jody_at_aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --hostfile
> > > hostfile
>
> ./a.out
>
> > > bash: orted: command not found
> > > [aim-nano_02:22145] ERROR: A daemon on node 130.60.49.129 failed to
> > > start as expected.
> > > [aim-nano_02:22145] ERROR: There may be more information available from
> > > [aim-nano_02:22145] ERROR: the remote shell (see above).
> > > [aim-nano_02:22145] ERROR: The daemon exited unexpectedly with status
>
> 127.
>
> > > [aim-nano_02:22145] ERROR: A daemon on node 130.60.49.128 failed to
> > > start as expected.
> > > [aim-nano_02:22145] ERROR: There may be more information available from
> > > [aim-nano_02:22145] ERROR: the remote shell (see above).
> > > [aim-nano_02:22145] ERROR: The daemon exited unexpectedly with status
>
> 127.
>
> > > However, i set PATH and LD_LIBRARY_PATH to the correct paths both in
> > > .bashrc AND .bash_profile.
> >
> > I assume you are using bash. You might try changing your .profile as
> > well.
> >
> > > For example:
> > > jody_at_aim-nano_02 /home/aim-cari/jody $ ssh 130.60.49.128 echo $PATH
>
> /opt/openmpi/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/i686-pc-linux-g
>nu/gcc-bin/4.1.2:/opt/sun-
> jdk-1.4.2.10/bin:/opt/sun-jdk-1.4.2.10/jre/bin:/opt/sun-jdk-1.4.2.10
> /jre/javaws:/usr/qt/3/bin
>
> > When you do this, $PATH gets interpreted on the local host, not the
> > remote host. Try instead:
> >
> > ssh 130.60.49.128 printenv |grep PATH
> >
> > > But:
> > > jody_at_aim-nano_02 /home/aim-cari/jody $ ssh 130.60.49.128 orted
> > > bash: orted: command not found
> >
> > You could also do:
> > ssh 130.60.49.128 which orted
> >
> > This will show you the paths it looked in for the orted.
> >
> > > Do You have any suggestions?
> >
> > To avoid dealing with paths (assuming everything is installed in the
> > same directory on all nodes) you can also try the suggestion here
> > (although I think that once it is setup modifying PATHs is the easier
> > way to go, less typing :):
> > http://www.open-mpi.org/faq/?category=running#mpirun-prefix
> >
> >
> > Hope this helps,
> >
> > Tim
> >
> > > Thank You
> > > Jody
> > >
> > > On 7/9/07, Tim Prins <tprins_at_[hidden]> wrote:
> > >> Hi Jody,
> > >>
> > >> Sorry for the super long delay. I don't know how this one got lost...
> > >>
> > >> I run like this all the time. Unfortunately, it is not as simple as I
> > >> would like. Here is what I do:
> > >>
> > >> 1. Log into the machine using ssh -X
> > >> 2. Run mpirun with the following parameters:
> > >> -mca pls rsh (This makes sure that Open MPI uses the rsh/ssh
>
> launcher.
>
> > >> It may not be necessary depending on your setup)
> > >> -mca pls_rsh_agent "ssh -X" (To make sure X information is
>
> forwarded.
>
> > >> This might not be necessary if you have ssh setup to always forward X
> > >> information)
> > >> --debug-daemons (This ensures that the ssh connections to the
>
> backed
>
> > >> nodes are kept open. Otherwise, they are closed and X information
>
> cannot
>
> > >> be forwarded. Unfortunately, this will also cause some debugging
> > >> output to be printed, but right now there is no other way :( )
> > >>
> > >> So, the complete command is:
> > >> mpirun -np 4 -mca pls rsh -mca pls_rsh_agent "ssh -X" --debug-daemons
> > >> xterm -e gdb my_prog
> > >>
> > >> I hope this helps. Let me know if you are still experiencing problems.
> > >>
> > >> Tim
> > >>
> > >> jody wrote:
> > >>> Hi
> > >>> For debugging i usually run each process in a separate X-window.
> > >>> This works well if i set the DISPLAY variable to the computer
> > >>> from which i am starting my OpenMPI application.
> > >>>
> > >>> This method fails however, if i log in (via ssh) to my workstation
> > >>> from a third computer and then start my OpenMPI application,
> > >>> only the processes running on the workstation i logged into can
> > >>> open their windows on the third computers. The processes on
> > >>> the other computers cant open their windows.
> > >>>
> > >>> This is how i start the processes
> > >>>
> > >>> mpirun -np 4 -x DISPLAY run_gdb.sh ./TestApp
> > >>>
> > >>> where run_gdb.sh looks like this
> > >>> -------------------------
> > >>> #!/bin/csh -f
> > >>>
> > >>> echo "Running GDB on node `hostname`"
> > >>> xterm -e gdb $*
> > >>> exit 0
> > >>> -------------------------
> > >>> The output from the processes on the other computer:
> > >>> xterm Xt error: Can't open display: localhost:12.0
> > >>>
> > >>> I there a way to tell OpenMPI to forward the X windows
> > >>> over yet another ssh connection?
> > >>>
> > >>> Thanks
> > >>> Jody
> > >>> _______________________________________________
> > >>> users mailing list
> > >>> users_at_[hidden]
> > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >>
> > >> _______________________________________________
> > >> users mailing list
> > >> users_at_[hidden]
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users