Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun fails on remote applications
From: Micha Feigin (michf_at_[hidden])
Date: 2009-05-12 11:43:58


It is usually best to separate the cluster (mpi) interfaces from the internet
interface.

Usually on a dedicated cluster it is best to have a master node that is
connected to the internet and client nodes that are connected to the master
node (and if needed tunnel the connection through it to the internet), or via a
gateway machine. That way the cluster machines don't need a firewall.

I case all machines are connected directly to the internet it is better to have
one (usually cheap) connection to the internet that can be firewalled, and a
(highend) connection inside the cluster that doesn't need a firewall.

On Tue, 12 May 2009 10:22:28 -0400
Jeff Squyres <jsquyres_at_[hidden]> wrote:

> Open MPI requires that each MPI process be able to connect to any
> other MPI process in the same job with random TCP ports. It is
> usually easiest to leave the firewall off, or setup trust
> relationships between your cluster nodes.
>
>
> On May 12, 2009, at 6:04 AM, feng chen wrote:
>
> > thanks a lot. firewall it is.. It works with firewall's off, while
> > that brings another questions from me. Is there anyway we can run
> > mpirun while firwall 's on? If yes, how do we setup firewall or
> > iptables?
> >
> > thank you
> >
> > From: Micha Feigin <michf_at_[hidden]>
> > To: users_at_[hidden]
> > Sent: Tuesday, May 12, 2009 4:30:30 AM
> > Subject: Re: [OMPI users] mpirun fails on remote applications
> >
> > On Tue, 12 May 2009 11:54:57 +0300
> > Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]> wrote:
> >
> > > sounds like firewall problems to or from anfield04.
> > > Lenny,
> > >
> > > On Tue, May 12, 2009 at 8:18 AM, feng chen <fch6699_at_[hidden]>
> > wrote:
> > >
> >
> > I'm having a similar problem, not sure if it's related (gave up for
> > the moment
> > on 1.3+ openmpi, 1.2.8 works fine nothing above that).
> >
> > 1. Try taking down the firewall and see if it works
> > 2. Make sure that passwordless ssh is working (not sure if it's
> > needed for all
> > things but still ...)
> > 3. can you test it maybe with openmpi 1.2.8?
> > 4. also, does posting the job in the other direction work? (4 -> 5
> > instead of 5 -> 4)
> > [fch6699_at_anfield04 test]$ mpirun -host anfield05 -np 4 ./hello
> >
> > >From what it seems on my cluster for my specific problem is that
> > machines have
> > different addresses based on which machine you are connecting from
> > (they are
> > connected directly to each other, not through a switch with a
> > central name
> > server), and name lookup seems to happen on the master instead of
> > the client
> > node so it is getting the wrong address.
> >
> > > > hi all,
> > > >
> > > > First of all,i'm new to openmpi. So i don't know much about mpi
> > setting.
> > > > That's why i'm following manual and FAQ suggestions from the
> > beginning.
> > > > Everything went well untile i try to run a pllication on a
> > remote node by
> > > > using 'mpirun -np' command. It just hanging there without doing
> > anything, no
> > > > error messanges, no
> > > > complaining or whatsoever. What confused me is that i can run
> > application
> > > > over ssh with no problem, while it comes to mpirun, just stuck
> > in there does
> > > > nothing.
> > > > I'm pretty sure i got everyting setup in the right way manner,
> > including no
> > > > password signin over ssh, environment variables for bot
> > interactive and
> > > > non-interactive logons.
> > > > A sample list of commands been used list as following:
> > > >
> > > >
> > > >
> > > >
> > > > [fch6699_at_anfield05 test]$ mpicc -o hello hello.f
> > > > [fch6699_at_anfield05 test]$ ssh anfield04 ./hello
> > > > 0 of 1: Hello world!
> > > > [fch6699_at_anfield05 test]$ mpirun -host anfield05 -np 4 ./hello
> > > > 0 of 4: Hello world!
> > > > 2 of 4: Hello world!
> > > > 3 of 4: Hello world!
> > > > 1 of 4: Hello world!
> > > > [fch6699_at_anfield05 test]$ mpirun -host anfield04 -np 4 ./hello
> > > > just hanging there for years!!!
> > > > need help to fix this !!
> > > > if u try it in another way
> > > > [fch6699_at_anfield05 test]$ mpirun -hostfile my_hostfile -np 4 ./
> > hell
> > > > still nothing happened, no warnnings, no complains, no error
> > messages.. !!
> > > >
> > > > All other files related to this issue can be found in
> > my_files.tar.gz in
> > > > attachment.
> > > >
> > > > .cshrc
> > > > The output of the "ompi_info --all" command.
> > > > my_hostfile
> > > > hello.c
> > > > output of iptables
> > > >
> > > > The only thing i've noticed is that the port of our ssh has been
> > changed
> > > > from 22 to other number for security issues.
> > > > Don't know will that have anything to with it or not.
> > > >
> > > >
> > > > Any help will be highly appreciated!!
> > > >
> > > > thanks in advance!
> > > >
> > > > Kevin
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > users_at_[hidden]
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>