Fernando Lemos ecrivait le 23/03/2010 16:28:
>> I'm trying to run openmpi (1.4.1) on two clusters; on each cluster, several
>> interfaces are private;
>>
>> on cluster1, nodes have 3 interfaces, and only 192.168.159.0/24 is visible
>> from cluster2.
>>
>> chicon-3
>> eth0 inet addr:192.168.160.76 Bcast:192.168.160.255 Mask:255.255.255.0
>> eth1 inet addr:192.168.159.76 Bcast:192.168.159.255 Mask:255.255.255.0
>> myri0 inet addr:192.168.162.76 Bcast:192.168.162.255 Mask:255.255.255.0
>>
>> on cluster2, nodes have 3 interfaces, and only 172.24.110.0/17 is visible
>> from cluster1
>>
>> netgdx-8
>> eth0 inet addr:172.24.190.8 Bcast:172.24.191.255 Mask:255.255.192.0
>> eth1 inet addr:172.24.110.8 Bcast:172.24.127.255 Mask:255.255.128.0
>> eth2 inet addr:172.24.240.8 Bcast:172.24.255.255 Mask:255.255.192.0
>>
>> so i'm using this to declare all the other networks as private:
>>
>> mpirun -machinefile ~/gridnodes --mca opal_net_private_ipv4
>> "192.168.162.0/24\;192.168.160.0/24\;172.24.192.0/18\;172.24.128.0/18"
>> ./alltoall
>>
>> but this doesn't work:
>
> Have you tried -mca btl_tcp_if_include/exclude?
I can't do that because the "public" interface is not always eth1 as in
this example (i have several other clusters with different network
configurations in my setup)
>> Why openmpi tries to connect different private networks, given that
>> "public" networks exists ? is it a bug or am i missing something ?
>
>>From what I've seen, I believe OpenMPI tries to find the fastest route
> to the nodes. In some cases it's trivial to sort that out, in other
> cases you might need to give it some hints.
yes, so i thought that "opal_net_private_ipv4" was the right thing for me;
but it doesn't work without the patch.
--
Nicolas NICLAUSSE Service DREAM
INRIA Sophia Antipolis http://www-sop.inria.fr/
2004 route des lucioles - BP 93 Tel: (33/0) 4 92 38 76 93
06902 SOPHIA-ANTIPOLIS cedex (France) Fax: (33/0) 4 92 38 76 02
|