Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] TCP BTL in different subnets?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-09-30 10:38:57


Sorry for the delay in replying -- I thought I had replied to this
already, but I guess I hadn't. :-(

We've talked about this feature several times, but this specific
functionality hasn't made it into the OMPI code base yet. Sorry! :-(

(patches would be gladly accepted, but note that we'll likely be kinda
picky about this code because it's a little hairy and complex...)

On Sep 19, 2008, at 7:00 PM, Jeroen Kleijer wrote:

> Hi,
>
> I'm trying to get an openmpi application running accross different
> nodes but seem to have hit a snag when the processes are on different
> nodes, especially when the machines are on different TCP subnets.
> The orted daemons start up fine but after that application borks with
> the message
>
> [0,1,2][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect]
> connect() failed with errno=111
>
> I've read in this thread
> http://thread.gmane.org/gmane.comp.clustering.open-mpi.user/3427/focus=3437
> that openmpi currently can't do this yet but (pre-release?) versions
> of openmpi 1.3 will work.
> I've tried compiling openmpi 1.3a (nightly build) and running my
> program with that (compiled with the mpicc of openmpi 1.3a) but I got
> the same error message.
>
> Can anybody confirm that:
> 1) openmpi has dificulties using the tcp btl accross different subnets
> 2) there are currently no workarounds for this.
>
> If there are solutions to this I'd really like to know about it as
> I've been trying this for quite a while now.
>
> Regards,
>
> Jeroen Kleijer
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems