I finally installed OpenMPI 1.0.2-a7 with libibverbs-1.0-rc5 and
libmthca-1.0-rc5 on Debian sarge with kernel 2.6.15 (from
www.backports.org) in order to use InfiniBand.
While InfiniBand seems to be working (ping with IPoIB works perfectly),
the mpirun/orterun command causes trouble using rsh as well as ssh.
The /usr/local/etc/openmpi-default-hostfile contains
Both hosts are completely identical (apart from network config) and the
problem is symmetric.
Although I can execute commands (all on node01) like
$ mpirun -np 1 hostname
$ rsh node02 hostname
$ mpirun -np 4 hostname
hangs. After pressing Ctrl+C it stops, but gives no hint about the cause
of the problem.
An output of
$ mpirun --debug -np 4 hostname
can be found in the attachment. The important line seems to be
[node02:12018] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect:
connect() failed with errno=113
Unfortunately, I don't know what errno=113 means, but obviously it's a
It doesn't seem to matter if orted runs or not. No processes are
launched on the remote host.