Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: jody (jody.xha_at_[hidden])
Date: 2007-07-09 12:52:29


Tim,
thanks for your suggestions.
There seems to be something wrong with the PATH:
jody_at_aim-nano_02 ~/progs $ ssh 130.60.49.128 printenv | grep PATH
PATH=/usr/bin:/bin:/usr/sbin:/sbin

which i don't understand. Logging via ssh into 130.60.49.128 i get:

jody_at_aim-nano_02 ~/progs $ ssh 130.60.49.128
Last login: Mon Jul 9 18:26:11 2007 from 130.60.49.129
jody_at_aim-nano_00 ~ $ cat .bash_profile
# /etc/skel/.bash_profile

# This file is sourced by bash for login shells. The following line
# runs your .bashrc and is recommended by the bash info pages.
[[ -f ~/.bashrc ]] && . ~/.bashrc

PATH=/opt/openmpi/bin:$PATH
export PATH
LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH

jody_at_aim-nano_00 ~ $ echo $PATH
/opt/openmpi/bin:/opt/openmpi/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/3.4.5:/opt/sun-
jdk-1.4.2.10/bin:/opt/sun-jdk-1.4.2.10/jre/bin:/opt/sun-jdk-1.4.2.10
/jre/javaws:/usr/qt/3/bin

(aim-nano_00 is the name of 130.60.49.128)
So why is the path set when i ssh by hand,
but not otherwise?

The suggestion with the --prefix option also didn't work:
jody_at_aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --prefix /opt/openmpi
--hostfile hostfile ./a.out
[aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file
dss/dss_peek.c at line 59
[aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file
dss/dss_peek.c at line 59
[aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file
dss/dss_peek.c at line 59
[aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file
dss/dss_peek.c at line 59
[aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file
dss/dss_peek.c at line 59
[aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file
dss/dss_peek.c at line 59

(after which the thing seems to hang....)

If i use the aim-nano_02 (130.60.49.130) instead of a hostfile,
jody_at_aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --prefix /opt/openmpi
--host 130.60.49.130 ./a.out
it works, as it does if i run it on the machine itself the standard way
jody_at_aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --host 130.60.49.130./a.out

Is there anything else i could try?

Jody

On 7/9/07, Tim Prins <tprins_at_[hidden]> wrote:
> jody wrote:
> > Hi Tim
> > (I accidentally sent the previous message before it was ready - here's
> > the complete one)
> > Thank You for your reply.
> > Unfortunately my workstation, on which i could successfully run openmpi
> > applications, has died. But one my replacement machine (which
> > i assume i have setup in an equivalent way) i now get errors even when i
try
> > to run an openmpi application in a simple way:
> >
> > jody_at_aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --hostfile hostfile
./a.out
> > bash: orted: command not found
> > [aim-nano_02:22145] ERROR: A daemon on node 130.60.49.129 failed to
> > start as expected.
> > [aim-nano_02:22145] ERROR: There may be more information available from
> > [aim-nano_02:22145] ERROR: the remote shell (see above).
> > [aim-nano_02:22145] ERROR: The daemon exited unexpectedly with status
127.
> > [aim-nano_02:22145] ERROR: A daemon on node 130.60.49.128 failed to
> > start as expected.
> > [aim-nano_02:22145] ERROR: There may be more information available from
> > [aim-nano_02:22145] ERROR: the remote shell (see above).
> > [aim-nano_02:22145] ERROR: The daemon exited unexpectedly with status
127.
> >
> > However, i set PATH and LD_LIBRARY_PATH to the correct paths both in
> > .bashrc AND .bash_profile.
> I assume you are using bash. You might try changing your .profile as well.
>
> >
> > For example:
> > jody_at_aim-nano_02 /home/aim-cari/jody $ ssh 130.60.49.128 echo $PATH
> >
/opt/openmpi/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/4.1.2:/opt/sun-
jdk-1.4.2.10/bin:/opt/sun-jdk-1.4.2.10/jre/bin:/opt/sun-jdk-1.4.2.10
/jre/javaws:/usr/qt/3/bin
>
> When you do this, $PATH gets interpreted on the local host, not the
> remote host. Try instead:
>
> ssh 130.60.49.128 printenv |grep PATH
>
> >
> > But:
> > jody_at_aim-nano_02 /home/aim-cari/jody $ ssh 130.60.49.128 orted
> > bash: orted: command not found
> >
> You could also do:
> ssh 130.60.49.128 which orted
>
> This will show you the paths it looked in for the orted.
>
> > Do You have any suggestions?
> To avoid dealing with paths (assuming everything is installed in the
> same directory on all nodes) you can also try the suggestion here
> (although I think that once it is setup modifying PATHs is the easier
> way to go, less typing :):
> http://www.open-mpi.org/faq/?category=running#mpirun-prefix
>
>
> Hope this helps,
>
> Tim
> >
> > Thank You
> > Jody
> >
> > On 7/9/07, Tim Prins <tprins_at_[hidden]> wrote:
> >> Hi Jody,
> >>
> >> Sorry for the super long delay. I don't know how this one got lost...
> >>
> >> I run like this all the time. Unfortunately, it is not as simple as I
> >> would like. Here is what I do:
> >>
> >> 1. Log into the machine using ssh -X
> >> 2. Run mpirun with the following parameters:
> >> -mca pls rsh (This makes sure that Open MPI uses the rsh/ssh
launcher.
> >> It may not be necessary depending on your setup)
> >> -mca pls_rsh_agent "ssh -X" (To make sure X information is
forwarded.
> >> This might not be necessary if you have ssh setup to always forward X
> >> information)
> >> --debug-daemons (This ensures that the ssh connections to the
backed
> >> nodes are kept open. Otherwise, they are closed and X information
cannot
> >> be forwarded. Unfortunately, this will also cause some debugging output
> >> to be printed, but right now there is no other way :( )
> >>
> >> So, the complete command is:
> >> mpirun -np 4 -mca pls rsh -mca pls_rsh_agent "ssh -X" --debug-daemons
> >> xterm -e gdb my_prog
> >>
> >> I hope this helps. Let me know if you are still experiencing problems.
> >>
> >> Tim
> >>
> >>
> >> jody wrote:
> >>> Hi
> >>> For debugging i usually run each process in a separate X-window.
> >>> This works well if i set the DISPLAY variable to the computer
> >>> from which i am starting my OpenMPI application.
> >>>
> >>> This method fails however, if i log in (via ssh) to my workstation
> >>> from a third computer and then start my OpenMPI application,
> >>> only the processes running on the workstation i logged into can
> >>> open their windows on the third computers. The processes on
> >>> the other computers cant open their windows.
> >>>
> >>> This is how i start the processes
> >>>
> >>> mpirun -np 4 -x DISPLAY run_gdb.sh ./TestApp
> >>>
> >>> where run_gdb.sh looks like this
> >>> -------------------------
> >>> #!/bin/csh -f
> >>>
> >>> echo "Running GDB on node `hostname`"
> >>> xterm -e gdb $*
> >>> exit 0
> >>> -------------------------
> >>> The output from the processes on the other computer:
> >>> xterm Xt error: Can't open display: localhost:12.0
> >>>
> >>> I there a way to tell OpenMPI to forward the X windows
> >>> over yet another ssh connection?
> >>>
> >>> Thanks
> >>> Jody
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>