Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] ORTE_ERROR_LOG: Timeout in file
From: Hugh Dickinson (h.j.dickinson_at_[hidden])
Date: 2009-04-28 06:57:26

Hi all,

First of all let me make it perfectly clear that I'm a complete
beginner as far as MPI is concerned, so this may well be a trivial

I've tried to set up Open MPI to use SSH to communicate between nodes
on a heterogeneous cluster. I've set up passwordless SSH and it seems
to be working fine. For example by hand I can do:

ssh nodename uptime

and it returns the appropriate information for each node.
I then tried running a non-MPI program on all the nodes at the same

mpirun -np 10 --hostfile hostfile uptime

Where hostfile is a list of the 10 cluster node names with slots=1
after each one i.e

nodename1 slots=1
nodename2 slots=2

Nothing happens! The process just seems to hang. If I interrupt the
process with Ctrl-C I get:


mpirun: killing job...

[] [0,0,0] ORTE_ERROR_LOG: Timeout in
file base/pls_base_orted_cmds.c at line 275
[] [0,0,0] ORTE_ERROR_LOG: Timeout in
file pls_rsh_module.c at line 1166

WARNING: mpirun has exited before it received notification that all
started processes had terminated.  You should double check and ensure
that there are no runaway processes still executing.
If, instead of using the hostfile, I specify on the command line the  
host from which I'm running mpirun, e.g.:
mpirun -np 1 --host nodename uptime
then it works (i.e. if it doesn't need to communicate with other  
nodes). Do I need to tell Open MPI it should be using SSH to  
communicate? If so, how do I do this? To be honest I think it's  
trying to do so, because before I set up passwordless SSH it  
challenged me for lots of passwords.
I'm running Open MPI 1.2.5 installed with Scientific Linux 5.2. Let  
me reiterate, it's very likely that I've done something stupid, so  
all suggestions are welcome.