Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Emanuel Ziegler (eziegler_at_[hidden])
Date: 2006-02-23 12:41:55


Hi!

I finally installed OpenMPI 1.0.2-a7 with libibverbs-1.0-rc5 and
libmthca-1.0-rc5 on Debian sarge with kernel 2.6.15 (from
www.backports.org) in order to use InfiniBand.

While InfiniBand seems to be working (ping with IPoIB works perfectly),
the mpirun/orterun command causes trouble using rsh as well as ssh.
The /usr/local/etc/openmpi-default-hostfile contains
   node01 slots=2
   node02 slots=2
Both hosts are completely identical (apart from network config) and the
problem is symmetric.
Although I can execute commands (all on node01) like
    $ mpirun -np 1 hostname
    node01
and
    $ rsh node02 hostname
    node02
the command
    $ mpirun -np 4 hostname
    node01
    node01
hangs. After pressing Ctrl+C it stops, but gives no hint about the cause
of the problem.
An output of
    $ mpirun --debug -np 4 hostname
can be found in the attachment. The important line seems to be
    [node02:12018] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect:
    connect() failed with errno=113
Unfortunately, I don't know what errno=113 means, but obviously it's a
TCP problem.

It doesn't seem to matter if orted runs or not. No processes are
launched on the remote host.

Thanks,
  Emanuel