Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-06-14 07:10:24

You have two options:

1. Ensure that your PATH and LD_LIBRARY_PATH are exactly what you
think they are on the remote nodes. A common problem that some
people run into is that they setup their PATH/LD_LIBRARY_PATH in the
"interactive" portions of their .bashrc, meaning that they are only
set for interactive logins (and therefore not set for non-interactive
logins). Try the following:

        ssh othernode 'echo $PATH'

Note the single quotes; they are necessary to ensure that "echo
$PATH" is evaluated on the *remote* node. Do the same with
$LD_LIBRARY_PATH and ensure that they are really set to the values
that you think they are. Check out the following FAQ entry:

2. Use the --prefix functionality in mpirun to automatically set the
PATH / LD_LIBRARY_PATH values for the remote node. Check out this
FAQ entry:

Note that a synonym to the --prefix functionality that is not [yet]
mentioned in that FAQ entry is that you can use the absolute pathname
to mpirun. For example:

     /path/to/mpirun ...

Or you can use OMPI 1.2's --enable-mpirun-prefix-by-default option to
OMPI's configure, which will tell mpirun to always assume that it
needs to use --prefix-like behavior (without you needing to specify
it on the mpirun command line).

Hope that helps.

On Jun 12, 2007, at 11:58 PM, lichanjuan04_at_[hidden] wrote:

> On Wed, 2007-06-13 at 11:47 +0800, lichanjuan04_at_[hidden] wrote:
>> hi,all:
>> I am a first user of openmpi, I have used mpich before.I found
>> there
>> are many differenties between them.So I am confused.
>> I build openmpi on a ps3 using default option,that is
>> $ ./configure --prefiex=
>> $ make all install
>> I modify my .bash_profile file and add openmpi lib and
>> executable file
>> I use NFS file system between server and node, I just install
>> openmpi on
>> server.
>> I check the mailling list and FAQ, knowing default lancher is
>> ssh,but I
>> sitll add "pls_rsh_agent = ssh" in openmpi-mca-params.conf.
>> I test the hello_c.c example. when I run:
>> $mpiexec -host ps3-2 -n 4 ./hello
>> it can run correctly(ps3-2 is hostname of server).I try it on
>> each node.
>> but when I run:
>> $ mpiexec -hostfile host.txt -n 4 ./hello
>> content of host.txt:
>> ps3-1
>> ps3-2
>> there is error message:
>> bash: orted: command not found
>> [ps3-1:25154] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 275
>> [ps3-1:25154] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> pls_rsh_module.c
>> at line 1164
>> [ps3-1:25154] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> errmgr_hnp.c at
>> line 90
>> [ps3-1:25154] ERROR: A daemon on node ps3-2 failed to
>> start as
>> expected.
>> [ps3-1:25154] ERROR: There may be more information available
>> from
>> [ps3-1:25154] ERROR: the remote shell (see above).
>> [ps3-1:25154] ERROR: The daemon exited unexpectedly with
>> status
>> 127.
>> [ps3-1:25154] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 188
>> [ps3-1:25154] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> pls_rsh_module.c
>> at line 1196
>> ---------------------------------------------------------------------
>> -----
>> mpiexec was unable to cleanly terminate the daemons for this
>> job.
>> Returned value Timeout instead of ORTE_SUCCESS.
>> ---------------------------------------------------------------------
>> -----
>> I search the same problem in mailing list and FAQ, saying
>> and
>> LD_LIBRARY_PATH are not setted correctly,but I ensure them
>> in my
>> path.
>> I use openmpi in first time, so hope anybody help me,thanks a
>> lot!
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> sorry, I forget some information. I use openmpi1.2, I try to run the
> command on remote host such as ,run command on ps3-1:
> $ mpiexec -host ps3-2 -n 2 ./a.out
> there appear same error message.I think there is something wrong with
> rsh/ssh,but I don't where to modify or some file I missed.
> if someone met same problem,please tell me the solution. I will be
> grateful. thanks very much!
> Li chanjuan
> --
> Li, Chanjuan Lanzhou University
> Distributed & Embedded System Lab
> School of Information Science and Engeneering
> lichanjuan04_at_[hidden]
> Tianshui South Road 222. Lanzhou
> 730000 .P.R.China
> Tel:+86-931-8912025 Fax:+86-931-8912022
> _______________________________________________
> users mailing list
> users_at_[hidden]

Jeff Squyres
Cisco Systems