Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Communications Problems when application distributed over, different nodes (Jeff Squyres)
From: Roland Albrecht (r.albrecht_at_[hidden])
Date: 2008-01-19 16:01:10


Hello

It has indeed been a problem with the firewall.
Thanks

Best regards
Roland Albrecht

>Do you have the Linux firewall running on either of your machines,
>perchance? This can either block random socket connections between
>nodes (which Open MPI's TCP communication will use) or eat the
>connection requests in a black-hole fashion such that the connections
>will timeout.

On Jan 16, 2008, at 5:35 AM, Roland Albrecht wrote:

> > Hello
> >
> > I'm running an FDTD programm (meep) using open-mpi on a mini-cluster
> > consisting of 2 computers. Since the exchange of the mainbord on the
> > node (with an identical one as before) I have a problem. I can't
> > find the change in the configurations which is now causing the
> > problen.
> >
> > Here's my problem:
> > I can start the meep application by mpi-run on each node
> > individually and the program runs without any problems.
> > However when I try to run the program distributed over both
> > computers I get at some point the following error message:
> > ...[0,1,1][btl_tcp_endpoint.c:
> > 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with
> > errno=110
> > Which translates by Perl as: Connection timed out at -e line 1.
> >
> > However I can't figure out where the problem lies in my network
> > configuration. SSH tunnels from one computer to another works. I
> > also can reach the internet from the node.
> >
> > In the attached archive there's the config.log from the top open-mpi
> > tree, there's the output of ompi_info --all and there's the network
> > configuration of both computers.
> >
> > I'm really greatfull for any help. Thank you!
> >
> > Best regards
> > Roland Albrecht