Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Can't start program across network -- solved!
From: Raymond Wan (rwan_at_[hidden])
Date: 2009-03-17 00:46:48


Hi Prentice/Jeff,

Prentice Bisbal wrote:
> In an earlier e-mail in this thread, I theorized that this might be a
> problem with your name service. This latest information seems to support
> that theory.

Thank you very much for the suggestions and help! After discussing with our system administrator the contents of your e-mails, Prentice, he looked into the problem. Indeed, in one direction of using rsh, I was getting "No route to host" but in the other, I was getting "Connection refused" -- so something was wrong. As it turns out, for the machine that worked, its firewall was disabled. For the remaining 7 machines, they had been enabled [perhaps by default].

Why the one computer had its firewall disabled is unknown (since there are two sysadmins and the person I talked to didn't do this), but as this network is behind an institute-wide firewall, he'll bring it down on all of the machines so that I can use MPI on them. Ralph posted a message just over a month ago that says that Open MPI doesn't support restricted port ranges (http://www.open-mpi.org/community/lists/users/2009/02/7997.php), so this seems to fine.

It's odd that ssh works fine and since I thought ssh should be used first by Open MPI, I never thought there was a firewall problem.

One question to anyone who might know the answer. So, I had two computers Y and Z. On Y, I can do this but on Z I could not:

mpirun --host Y,Z --np 3 uname -a

But, it is Y that had its firewall disabled; Z has it enabled. So, the firewall was blocking out-going traffic? I would think that if I did "--host Y,Z" if the firewall is up on one computer but down on the other, things still shouldn't work...

After all this, there might still be other problems with the network...i.e., it isn't a matter of the firewall being up or down, but it was incorrectly configured. But as the sysadmin is happy to take this option, I'll take it too...

Thank you both for your help!

Ray