Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] problem with opal_net_private_ipv4
From: Nicolas Niclausse (Nicolas.Niclausse_at_[hidden])
Date: 2010-03-23 09:25:07


Hello,

I'm trying to run openmpi (1.4.1) on two clusters; on each cluster, several
interfaces are private;

on cluster1, nodes have 3 interfaces, and only 192.168.159.0/24 is visible
from cluster2.

chicon-3
eth0 inet addr:192.168.160.76 Bcast:192.168.160.255 Mask:255.255.255.0
eth1 inet addr:192.168.159.76 Bcast:192.168.159.255 Mask:255.255.255.0
myri0 inet addr:192.168.162.76 Bcast:192.168.162.255 Mask:255.255.255.0

on cluster2, nodes have 3 interfaces, and only 172.24.110.0/17 is visible
from cluster1

netgdx-8
eth0 inet addr:172.24.190.8 Bcast:172.24.191.255 Mask:255.255.192.0
eth1 inet addr:172.24.110.8 Bcast:172.24.127.255 Mask:255.255.128.0
eth2 inet addr:172.24.240.8 Bcast:172.24.255.255 Mask:255.255.192.0

so i'm using this to declare all the other networks as private:

mpirun -machinefile ~/gridnodes --mca opal_net_private_ipv4
"192.168.162.0/24\;192.168.160.0/24\;172.24.192.0/18\;172.24.128.0/18"
./alltoall

but this doesn't work:

[netgdx-8][[64214,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.160.76 failed: No route to host (113)
[netgdx-8][[64214,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.160.76 failed: No route to host (113)
[netgdx-8][[64214,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.160.76 failed: No route to host (113)
[netgdx-8][[64214,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.160.76 failed: No route to host (113)
[netgdx-8][[64214,1],4][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.160.76 failed: No route to host (113)

the following patch works for me :

diff -u ompi/mca/btl/tcp/btl_tcp_proc.c.orig ompi/mca/btl/tcp/btl_tcp_proc.c
--- ompi/mca/btl/tcp/btl_tcp_proc.c.orig 2010-03-23
14:01:28.000000000 +0100
+++ ompi/mca/btl/tcp/btl_tcp_proc.c 2010-03-23 14:01:50.000000000 +0100
@@ -496,7 +496,7 @@
                                 local_interfaces[i]->ipv4_netmask)) {
                         weights[i][j] = CQ_PRIVATE_SAME_NETWORK;
                     } else {
- weights[i][j] = CQ_PRIVATE_DIFFERENT_NETWORK;
+ weights[i][j] = CQ_NO_CONNECTION;
                     }
                     best_addr[i][j] = peer_interfaces[j]->ipv4_endpoint_addr;
                 }

Why openmpi tries to connect different private networks, given that
"public" networks exists ? is it a bug or am i missing something ?

-- 
Nicolas NICLAUSSE                          Service DREAM
INRIA Sophia Antipolis                     http://www-sop.inria.fr/
2004 route des lucioles - BP 93            Tel: (33/0) 4 92 38 76 93
06902  SOPHIA-ANTIPOLIS cedex (France)     Fax: (33/0) 4 92 38 76 02