Im using Open-MPI on a small Cluster of RHEL5.3-Nodes, current
MPI-Version. For me it is a requirement now to run MPI over a certain
adapter, in the current case the eth1-interface of my system. The
adapter I want to use MPI for is not the default-adapter (eth0) all the
rest of the traffic has to go over, but I cannot make MPI use the other
adapter and therefore a different IP-Address.
The exact problem, showed on 2 Nodes:
for testing purposes, I linked the eth1 adapters of both machines
together directly and access the machines remotely via eth0. If I now
try to run an MPI-Program (in this case the MPI-Benchmark HPL) with a
hosts file that specifies 10.0.1.21 and 10.0.1.22 as hosts, it gets
quite problematic. The netstat a command shows me that it uses the
addresses 10.42.* for the connection, the --debug-demon flag tells me
that MPI initializes both nodes, but after that it runs forever and does
not terminate. In addition to that, apart from initial traffic of a
couple of packets, it does not send any network traffic over either of
the network adapters.
Please tell me if any of you have encounter such a problem or setup and
can tell me how to fix it. I tried modifying routing tables, play around
with subnetting, but I wasnt able to get a successful connection. If
you need more information on that, please tell me. Please note that Im
quite new to Open-MPI, so it might possibly be something about Open-MPI
I just havent discovered yet.