Have you gone to those nodes and checked their IP addresses of -all- interfaces? OMPI must be picking up those addresses from somewhere - best guess is that those nodes have multiple interfaces on them, some of which are configured to those addresses.

Remember: we don't look at the /etc/hosts file where mpirun is executed to get the addresses. Processes started on each remote node actually query the addresses of all available interfaces on that node. The result is frequently different than the address provided in your /etc/hosts file.


On Jul 10, 2011, at 7:45 PM, zhuangchao wrote:

hello all :
 
 
       I   run  the following command :  
 
/data1/cluster/openmpi/bin/mpirun  -d  -machinefile  /tmp/nodes.10515.txt   -np  3  /data1/cluster/mpiblast-pio-1.6/bin/mpiblast   -p blastn -i /data1/cluster/sequences/seq_4.txt -d Baculo_Nucleotide -o /data1/cluster/blast.out/blast.out.10515      -g T -m  0 -F F
 
      Then  I  get  the following  error  from  openmpi:
 
[node7][[3812,1],2][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.5 failed: No route to host (113)
[node7][[3812,1],2][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 159.226.126.15 failed: No route to host (113)
 
     The  machinefile  is defined  as  following :
     
     node1
     node5
     node7
 
     192.168.0.5  is  not  the  ip  of  hosts in the  machinefile .    159.226.126.15  is  the   another ip of  node1 .  But  hostname node1
 
corresponds to   11.11.11.1  in  the /etc/hosts .
 
    why   do  I  get  the error ?      Can  you  help me ?
 
       Thank you !    
     
 
       
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users