Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problem using mpirun over multiple nodes
From: Jagannath Mondal (jagannath.mondal_at_[hidden])
Date: 2011-05-26 00:42:05


Hi Jeff,
    I was wondering how I can check whether there is any firewall software .
In fact I can use ssh to go from one machine to another . But, only with
mpirun , it does not work. I was wondering whether it is possible that even
in presence of firewall ssh may work but mpirun may not.
Jagannath

On Wed, May 25, 2011 at 10:42 PM, Jeff Squyres (jsquyres) <
jsquyres_at_[hidden]> wrote:

> Are you running any firewall software?
>
> Sent from my phone. No type good.
>
> On May 25, 2011, at 10:41 PM, "Jagannath Mondal" <
> jagannath.mondal_at_[hidden]> wrote:
>
> Hi,
> I am having a problem in running mpirun over multiple nodes.
> To run a job over two 8-core processors, I generated a hostfile as
> follows:
> yethiraj30 slots=8 max_slots=8
> yethiraj31 slots=8 max_slots=8
>
> These two machines are intra-connected and I have installed openmpi 1.3.3.
> Then If I try to run the replica exchange simulation using the following
> command:
> mpirun -np 16 --hostfile hostfile mdrun_4mpi -s topol_.tpr -multi 16
> -replex 100 >& log_replica_test
>
> But I find following error and job does not proceed at all :
> btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to
> 192.168.0.31 failed: No route to host (113)
>
> Here is the full details:
>
> NNODES=16, MYRANK=0, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=1, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=4, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=2, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=6, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=3, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=5, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=7, HOSTNAME=yethiraj30
> [yethiraj30][[22604,1],0][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> [yethiraj30][[22604,1],4][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> [yethiraj30][[22604,1],6][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> [yethiraj30][[22604,1],1][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> [yethiraj30][[22604,1],3][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> [yethiraj30][[22604,1],2][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> NNODES=16, MYRANK=10, HOSTNAME=yethiraj31
> NNODES=16, MYRANK=12, HOSTNAME=yethiraj31
>
> I am not sure how to resolve this issue. In general, I can go from one
> machine to another without any problem using ssh. But, when I am trying to
> run openmpi over both the machines, I get this error. Any help will be
> appreciated.
>
> Jagannath
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>