Hello
I'm running an FDTD programm (meep) using open-mpi on a mini-cluster
consisting of 2 computers. Since the exchange of the mainbord on the
node (with an identical one as before) I have a problem. I can't find
the change in the configurations which is now causing the problen.
Here's my problem:
I can start the meep application by mpi-run on each node individually
and the program runs without any problems.
However when I try to run the program distributed over both computers I
get at some point the following error message:
...[0,1,1][btl_tcp_endpoint.c:
572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=110
Which translates by Perl as: Connection timed out at -e line 1.
However I can't figure out where the problem lies in my network
configuration. SSH tunnels from one computer to another works. I also
can reach the internet from the node.
In the attached archive there's the config.log from the top open-mpi
tree, there's the output of ompi_info --all and there's the network
configuration of both computers.
I'm really greatfull for any help. Thank you!
Best regards
Roland Albrecht
--
___________________________________________
Roland Albrecht, Dipl. Phys. ETH
-------------------------------------------
Universität des Saarlandes
Fachrichtung 7.3 (Technische Physik)
AG Prof. Dr. Christoph Becher
Campus E2.6, Zimmer 2.04
D-66123 Saarbrücken
Germany
Phone:+49(0)681 302 3418
Fax: +49(0)681 302 4676
skype: roland_albrecht
- application/octet-stream attachment: mpi.rar
|