Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] TCP BTL in different subnets?
From: Jeroen Kleijer (jeroen.kleijer_at_[hidden])
Date: 2008-09-19 19:00:24


Hi,

I'm trying to get an openmpi application running accross different
nodes but seem to have hit a snag when the processes are on different
nodes, especially when the machines are on different TCP subnets.
The orted daemons start up fine but after that application borks with
the message

[0,1,2][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=111

I've read in this thread
http://thread.gmane.org/gmane.comp.clustering.open-mpi.user/3427/focus=3437
that openmpi currently can't do this yet but (pre-release?) versions
of openmpi 1.3 will work.
I've tried compiling openmpi 1.3a (nightly build) and running my
program with that (compiled with the mpicc of openmpi 1.3a) but I got
the same error message.

Can anybody confirm that:
1) openmpi has dificulties using the tcp btl accross different subnets
2) there are currently no workarounds for this.

If there are solutions to this I'd really like to know about it as
I've been trying this for quite a while now.

Regards,

Jeroen Kleijer