Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Communications Problems when application distributed over different nodes
From: Roland Albrecht (r.albrecht_at_[hidden])
Date: 2008-01-16 05:35:55


I'm running an FDTD programm (meep) using open-mpi on a mini-cluster
consisting of 2 computers. Since the exchange of the mainbord on the
node (with an identical one as before) I have a problem. I can't find
the change in the configurations which is now causing the problen.

Here's my problem:
I can start the meep application by mpi-run on each node individually
and the program runs without any problems.
However when I try to run the program distributed over both computers I
get at some point the following error message:
572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=110
Which translates by Perl as: Connection timed out at -e line 1.

However I can't figure out where the problem lies in my network
configuration. SSH tunnels from one computer to another works. I also
can reach the internet from the node.

In the attached archive there's the config.log from the top open-mpi
tree, there's the output of ompi_info --all and there's the network
configuration of both computers.

I'm really greatfull for any help. Thank you!

Best regards
Roland Albrecht

Roland Albrecht, Dipl. Phys. ETH
Universität des Saarlandes
Fachrichtung 7.3 (Technische Physik)
AG Prof. Dr. Christoph Becher
Campus E2.6, Zimmer 2.04
D-66123 Saarbrücken
Phone:+49(0)681 302 3418
Fax: +49(0)681 302 4676
skype: roland_albrecht

  • application/octet-stream attachment: mpi.rar