Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] problem using mpirun over multiple nodes
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-05-26 06:19:10


ssh may be allowed but other random TCP ports may not.

iptables is the typical firewall software that most Linux installations use; it may have been enabled by default.

I'm a little doubtful that this is your problem, though, because you're apparently able to *launch* your application, which means that OMPI's out-of-band communication system was able to make some sockets. So it's a little weird that the MPI layer's TCP sockets were borked. But let's check for firewall software, first...

On May 26, 2011, at 12:42 AM, Jagannath Mondal wrote:

> Hi Jeff,
> I was wondering how I can check whether there is any firewall software . In fact I can use ssh to go from one machine to another . But, only with mpirun , it does not work. I was wondering whether it is possible that even in presence of firewall ssh may work but mpirun may not.
> Jagannath
>
> On Wed, May 25, 2011 at 10:42 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
> Are you running any firewall software?
>
> Sent from my phone. No type good.
>
> On May 25, 2011, at 10:41 PM, "Jagannath Mondal" <jagannath.mondal_at_[hidden]> wrote:
>
>> Hi,
>> I am having a problem in running mpirun over multiple nodes.
>> To run a job over two 8-core processors, I generated a hostfile as follows:
>> yethiraj30 slots=8 max_slots=8
>> yethiraj31 slots=8 max_slots=8
>>
>> These two machines are intra-connected and I have installed openmpi 1.3.3.
>> Then If I try to run the replica exchange simulation using the following command:
>> mpirun -np 16 --hostfile hostfile mdrun_4mpi -s topol_.tpr -multi 16 -replex 100 >& log_replica_test
>>
>> But I find following error and job does not proceed at all :
>> btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.31 failed: No route to host (113)
>>
>> Here is the full details:
>>
>> NNODES=16, MYRANK=0, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=1, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=4, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=2, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=6, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=3, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=5, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=7, HOSTNAME=yethiraj30
>> [yethiraj30][[22604,1],0][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.31 failed: No route to host (113)
>> [yethiraj30][[22604,1],4][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.31 failed: No route to host (113)
>> [yethiraj30][[22604,1],6][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.31 failed: No route to host (113)
>> [yethiraj30][[22604,1],1][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.31 failed: No route to host (113)
>> [yethiraj30][[22604,1],3][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.31 failed: No route to host (113)
>> [yethiraj30][[22604,1],2][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.31 failed: No route to host (113)
>> NNODES=16, MYRANK=10, HOSTNAME=yethiraj31
>> NNODES=16, MYRANK=12, HOSTNAME=yethiraj31
>>
>> I am not sure how to resolve this issue. In general, I can go from one machine to another without any problem using ssh. But, when I am trying to run openmpi over both the machines, I get this error. Any help will be appreciated.
>>
>> Jagannath
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/