Hello,
I have two mahcines each having 3 live interfaces: lo, eth0
(interanet) and usb0 (internet). eth0 cannot access usb0 on the
other machine (and vice-veras). Now, when I try to run the MPI
program with these two hosts I cannot get any output, even --mca
btl_base_verbose 30 does not give any output. If I set hostfile
to have only localhost, then everything runs fine.
I tried out the same code and hostfile with two other machines
with two interfaces: lo and eth1, which can access each
other. The program runs fine on these machines.
Next, I setup btl_tcp_if_exclude to lo,usb0 (on the first arch)
and also the ip-address/mask, but this does not work
either. When I run the program on one machine and do "ps aux |
grep mpi" on the other I can see --hnp-uri being set to the
usb0's ip-address, which it should not, because I have set usb0
to be exluded in the btl_tcp_if_exclude list. So, what exactly
am I doing wrong here?
I read the otimization FAQ and saw how openmpi builds the
bipartite graphs for connection. But, as I said before, eth0
cannot access usb0's ip and vice-versa, how can I get rid of the
usb0 ip-address showing up in --hnp-uri, because this is the
only difference between the working and the non-working archs.
Regards,
--
Avinash Malik
|