Hi Jeff,
  Thanks to you, I figured the problem . As you suspected, it was iptables which was  acting as firewalls in some machines. So, after I stopped the iptable, the MPI communication is going fine. Even I tried with 5 machines together and the communication is going allright.
Thanks again,
Jagannath

On Thu, May 26, 2011 at 5:19 AM, Jeff Squyres <jsquyres@cisco.com> wrote:
ssh may be allowed but other random TCP ports may not.

iptables is the typical firewall software that most Linux installations use; it may have been enabled by default.

I'm a little doubtful that this is your problem, though, because you're apparently able to *launch* your application, which means that OMPI's out-of-band communication system was able to make some sockets.  So it's a little weird that the MPI layer's TCP sockets were borked.  But let's check for firewall software, first...


On May 26, 2011, at 12:42 AM, Jagannath Mondal wrote:

> Hi Jeff,
>     I was wondering how I can check whether there is any firewall software . In fact I can use ssh to go from one machine to another . But, only with mpirun , it does not work. I was wondering whether it is possible that even in presence of firewall ssh may work but mpirun may not.
> Jagannath
>
> On Wed, May 25, 2011 at 10:42 PM, Jeff Squyres (jsquyres) <jsquyres@cisco.com> wrote:
> Are you running any firewall software?
>
> Sent from my phone. No type good.
>
> On May 25, 2011, at 10:41 PM, "Jagannath Mondal" <jagannath.mondal@gmail.com> wrote:
>
>> Hi,
>> I am having a problem in running mpirun  over multiple nodes.
>> To run a job  over two 8-core processors, I generated a hostfile as follows:
>>  yethiraj30 slots=8 max_slots=8
>>   yethiraj31 slots=8 max_slots=8
>>
>> These two machines are intra-connected and I have installed openmpi 1.3.3.
>> Then If I try to run the replica exchange simulation using the following command:
>> mpirun -np 16 --hostfile  hostfile  mdrun_4mpi -s topol_.tpr -multi 16 -replex 100 >& log_replica_test
>>
>> But I find following error and job does not proceed at all :
>> btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.31 failed: No route to host (113)
>>
>> Here is the full details:
>>
>> NNODES=16, MYRANK=0, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=1, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=4, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=2, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=6, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=3, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=5, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=7, HOSTNAME=yethiraj30
>> [yethiraj30][[22604,1],0][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.31 failed: No route to host (113)
>> [yethiraj30][[22604,1],4][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.31 failed: No route to host (113)
>> [yethiraj30][[22604,1],6][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.31 failed: No route to host (113)
>> [yethiraj30][[22604,1],1][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.31 failed: No route to host (113)
>> [yethiraj30][[22604,1],3][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.31 failed: No route to host (113)
>> [yethiraj30][[22604,1],2][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.31 failed: No route to host (113)
>> NNODES=16, MYRANK=10, HOSTNAME=yethiraj31
>> NNODES=16, MYRANK=12, HOSTNAME=yethiraj31
>>
>> I am not sure how to resolve this issue. In general, I can go from one machine to another without any problem using ssh. But, when I am trying to run openmpi over both the machines, I get this error. Any help will be appreciated.
>>
>> Jagannath
>> _______________________________________________
>> users mailing list
>> users@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquyres@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users