On Feb 24, 2006, at 8:23 AM, Emanuel Ziegler wrote:
>> So, the question from the mpirun_debug.out-file is, what IP-
>> addresses do
>> node01 and node02 have, is the local 10.0.0.1 node01, while
>> 10.1.0.1 is
>> Maybe the route on node01 is not correct to node02?
> Ok, I figured out the problem, but didn't solve it completely.
> node01 and node02 both have multiple IP addresses.
> node01 has 10.0.0.1 for TCP (eth1) and 10.1.0.1 for IPoIB (ib0).
> node02 has 10.0.0.2 for TCP (eth1) and 10.1.0.2 for IPoIB (ib0).
> The latter addresses are useless, but don't affect the problem. I
> eth1 on both machines b/c eth0 is only 10/100 MBit and I wanted to
> GBit connections to the file server in the internal network. The
> was, that I set up eth0 on node01 (golden client) using DHCP on the
> external network for setup purposes. Hence, it also had an external
> address (220.127.116.11) which was unaccessible from node02.
> Since orterun was started with the parameters
> --nsreplica "0.0.0;tcp://18.104.22.168:54866;tcp://
> --gprreplica "0.0.0;tcp://22.214.171.124:54866;tcp://
> node02 first tried to communicate with 126.96.36.199 which was
> impossible and hanged although it would have been able to access
> 10.0.0.1 without any problems. But obviously it never got to this
> Although disabling eth0 with "ifdown eth0" solves the problem, this is
> not applicable to my cluster since this was just a test setup und I
> the external address for my head node.
> Can I configure orterun/orted to use only eth1?
Yes, start mpirun with the arguments "-mca oob_tcp_include eth1 -mca
btl_tcp_if_include eth1" and it should work properly. The paramaters
can also be set in either the global or per-user configuration file
for Open MPI (once you have it tested, of course). See our FAQ item
The second argument is because you'll probably run into the exact
same problem when the TCP transport tries to start up (although it
sounds like you're going to be using native IB for communicate, it
never hurts to make sure TCP has a chance of working).
Open MPI developer