Hi Hugh 

Just to make sure:
You have installed Open-MPI on all your nodes?
Same version everywhere? 

Jody 


On Tue, Apr 28, 2009 at 12:57 PM, Hugh Dickinson
<h.j.dickinson_at_[hidden]> wrote:
> Hi all,
>
> First of all let me make it perfectly clear that I'm a complete beginner as
> far as MPI is concerned, so this may well be a trivial problem!
>
> I've tried to set up Open MPI to use SSH to communicate between nodes on a
> heterogeneous cluster. I've set up passwordless SSH and it seems to be
> working fine. For example by hand I can do:
>
> ssh nodename uptime
>
> and it returns the appropriate information for each node.
> I then tried running a non-MPI program on all the nodes at the same time:
>
> mpirun -np 10 --hostfile hostfile uptime
>
> Where hostfile is a list of the 10 cluster node names with slots=1 after
> each one i.e
>
> nodename1 slots=1
> nodename2 slots=2
> etc...
>
> Nothing happens! The process just seems to hang. If I interrupt the process
> with Ctrl-C I get:
>
> "
>
> mpirun: killing job...
>
> [gamma2.phyastcl.dur.ac.uk:18124] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> base/pls_base_orted_cmds.c at line 275
> [gamma2.phyastcl.dur.ac.uk:18124] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> pls_rsh_module.c at line 1166
> --------------------------------------------------------------------------
> WARNING: mpirun has exited before it received notification that all
> started processes had terminated.  You should double check and ensure
> that there are no runaway processes still executing.
> --------------------------------------------------------------------------
>
> "
>
> If, instead of using the hostfile, I specify on the command line the host
> from which I'm running mpirun, e.g.:
>
> mpirun -np 1 --host nodename uptime
>
> then it works (i.e. if it doesn't need to communicate with other nodes). Do
> I need to tell Open MPI it should be using SSH to communicate? If so, how do
> I do this? To be honest I think it's trying to do so, because before I set
> up passwordless SSH it challenged me for lots of passwords.
>
> I'm running Open MPI 1.2.5 installed with Scientific Linux 5.2. Let me
> reiterate, it's very likely that I've done something stupid, so all
> suggestions are welcome.
>
> Cheers,
>
> Hugh
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users