Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Can't start program across network
From: Raymond Wan (rwan_at_[hidden])
Date: 2009-03-13 06:17:23


Hi all,

I'm having a problem running mpirun and I was wondering if there are suggestions on how to find out the cause. I have 3 machines that I can use: X, Y, and Z. The important thing is that X is different from Y and Z (the software installed, version of Linux, etc. Y and Z are identical software installations.)

All of this works:

[On X] mpirun --host Y,Z --np 2 uname -a
[On X] mpirun --host X,Y,Z --np 3 uname -a
[On Y] mpirun --host Y --np 2 uname -a

(and likewise, other combinations)

What doesn't work is:

[On Y] mpirun --host Y,Z --np 2 uname -a
[On Y] mpirun --host X,Y,Z --np 3 uname -a

...and similarly for machine Z. I can confirm that from any of the 3 machines, I can ssh to the other without typing in a password. I set up the RSA keys correctly [I think]. When I run the above commands, it just hangs. Adding "--verbose" doesn't produce any information...I don't know what it's doing. I had a longer running program than "uname" and I didn't see it appear on any of the machines. In fact [since it hangs], I don't see uname on "top", either. I do, however, see "mpirun" and "orted" on top, though.

I guess some setup is missing that X has that the other two do not have. Any suggestions on how to find out the cause of this problem? Thank you!

Ray

PS: It has been a long time since I got X working...I might have done something that I no longer remember; but I don't remember seeing this problem before.