Raymond Wan wrote:
> Actually, when I run the above mpirun command, I don't see "sleep"
> running locally on machine Y, either. However, if I did this:
> mpirun --host Y --np 3 sleep 1000
> I see 3 instances of "sleep" when I do ps -aedf. Does mpirun try to
> "ssh" all networked machines first before it starts the program (even if
> one of those instances will run locally?). Perhaps unrelated...but when
> I am on Y and I do an rsh to Z, I get a "No route to host". I asked the
> sysadmin about it (I'm not the sysadmin of Y or Z) and he doesn't know
> why but as we should be using ssh anyway, he isn't going to address the
> problem (unless it is a side-effect of my mpirun problem). I only
> presume rsh hasn't been set up properly; ssh works fine, though.
The "no route to host" problem should be a problem independent of
whether you are using rsh or ssh. That is a problem with your name
service. Either DNS, LDAP, NIS or /etc/hosts (depending on which name
service you are using) is screwed up. You should get that error
regardless of whether you use rsh, ssh, or any other command that needs
to resolve hostnames to IP addresses, like ping, ftp, telnet, etc.
It could be that one of your hosts has misconfigured host information.
I'd start by comparing /etc/hosts on all 3 systems. What happens if you
replace the hostnames in your MPI hosts file with their IP addresses?