Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Communications Problems when application distributed over different nodes
From: Roland Albrecht (r.albrecht_at_[hidden])
Date: 2008-01-16 05:35:55


Hello

I'm running an FDTD programm (meep) using open-mpi on a mini-cluster
consisting of 2 computers. Since the exchange of the mainbord on the
node (with an identical one as before) I have a problem. I can't find
the change in the configurations which is now causing the problen.

Here's my problem:
I can start the meep application by mpi-run on each node individually
and the program runs without any problems.
However when I try to run the program distributed over both computers I
get at some point the following error message:
...[0,1,1][btl_tcp_endpoint.c:
572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=110
Which translates by Perl as: Connection timed out at -e line 1.

However I can't figure out where the problem lies in my network
configuration. SSH tunnels from one computer to another works. I also
can reach the internet from the node.

In the attached archive there's the config.log from the top open-mpi
tree, there's the output of ompi_info --all and there's the network
configuration of both computers.

I'm really greatfull for any help. Thank you!

Best regards
Roland Albrecht

-- 
___________________________________________
Roland Albrecht, Dipl. Phys. ETH
-------------------------------------------
Universität des Saarlandes
Fachrichtung 7.3 (Technische Physik)
AG Prof. Dr. Christoph Becher
Campus E2.6, Zimmer 2.04
D-66123 Saarbrücken
Germany
Phone:+49(0)681 302 3418
Fax: +49(0)681 302 4676
skype: roland_albrecht


  • application/octet-stream attachment: mpi.rar