Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problem using mpirun over multiple nodes
From: Jagannath Mondal (jagannath.mondal_at_[hidden])
Date: 2011-05-26 13:38:30


Hi Jeff,
  Thanks to you, I figured the problem . As you suspected, it was iptables
which was acting as firewalls in some machines. So, after I stopped the
iptable, the MPI communication is going fine. Even I tried with 5 machines
together and the communication is going allright.
Thanks again,
Jagannath

On Thu, May 26, 2011 at 5:19 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:

> ssh may be allowed but other random TCP ports may not.
>
> iptables is the typical firewall software that most Linux installations
> use; it may have been enabled by default.
>
> I'm a little doubtful that this is your problem, though, because you're
> apparently able to *launch* your application, which means that OMPI's
> out-of-band communication system was able to make some sockets. So it's a
> little weird that the MPI layer's TCP sockets were borked. But let's check
> for firewall software, first...
>
>
> On May 26, 2011, at 12:42 AM, Jagannath Mondal wrote:
>
> > Hi Jeff,
> > I was wondering how I can check whether there is any firewall
> software . In fact I can use ssh to go from one machine to another . But,
> only with mpirun , it does not work. I was wondering whether it is possible
> that even in presence of firewall ssh may work but mpirun may not.
> > Jagannath
> >
> > On Wed, May 25, 2011 at 10:42 PM, Jeff Squyres (jsquyres) <
> jsquyres_at_[hidden]> wrote:
> > Are you running any firewall software?
> >
> > Sent from my phone. No type good.
> >
> > On May 25, 2011, at 10:41 PM, "Jagannath Mondal" <
> jagannath.mondal_at_[hidden]> wrote:
> >
> >> Hi,
> >> I am having a problem in running mpirun over multiple nodes.
> >> To run a job over two 8-core processors, I generated a hostfile as
> follows:
> >> yethiraj30 slots=8 max_slots=8
> >> yethiraj31 slots=8 max_slots=8
> >>
> >> These two machines are intra-connected and I have installed openmpi
> 1.3.3.
> >> Then If I try to run the replica exchange simulation using the following
> command:
> >> mpirun -np 16 --hostfile hostfile mdrun_4mpi -s topol_.tpr -multi 16
> -replex 100 >& log_replica_test
> >>
> >> But I find following error and job does not proceed at all :
> >> btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect()
> to 192.168.0.31 failed: No route to host (113)
> >>
> >> Here is the full details:
> >>
> >> NNODES=16, MYRANK=0, HOSTNAME=yethiraj30
> >> NNODES=16, MYRANK=1, HOSTNAME=yethiraj30
> >> NNODES=16, MYRANK=4, HOSTNAME=yethiraj30
> >> NNODES=16, MYRANK=2, HOSTNAME=yethiraj30
> >> NNODES=16, MYRANK=6, HOSTNAME=yethiraj30
> >> NNODES=16, MYRANK=3, HOSTNAME=yethiraj30
> >> NNODES=16, MYRANK=5, HOSTNAME=yethiraj30
> >> NNODES=16, MYRANK=7, HOSTNAME=yethiraj30
> >>
> [yethiraj30][[22604,1],0][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> >>
> [yethiraj30][[22604,1],4][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> >>
> [yethiraj30][[22604,1],6][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> >>
> [yethiraj30][[22604,1],1][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> >>
> [yethiraj30][[22604,1],3][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> >>
> [yethiraj30][[22604,1],2][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> >> NNODES=16, MYRANK=10, HOSTNAME=yethiraj31
> >> NNODES=16, MYRANK=12, HOSTNAME=yethiraj31
> >>
> >> I am not sure how to resolve this issue. In general, I can go from one
> machine to another without any problem using ssh. But, when I am trying to
> run openmpi over both the machines, I get this error. Any help will be
> appreciated.
> >>
> >> Jagannath
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>