Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] TCP BTL in different subnets?
From: Jeroen Kleijer (jeroen.kleijer_at_[hidden])
Date: 2008-09-30 10:53:32


Hi Jeff,

No worries.
I've been able to get the most recent (1.3a september 25th) to
compile and it does exactly what I need it to do (which is work
accross different subnets) and I can basically support that myself.
(not quite sure what went wrong first time I tried this though)

Strange thing is, we've searched through the 1.2 code branch for the
function that causes this (which is in the file
ompi/mca/btl/tcp/btl_tcp_proc.c, function is_private_ipv4() ) and
adjusted this to always return true. This also seems to work! (don't
think this will be accepted as a patch as I have absolutely _no_ idea
what it'll break but both solutions seem to work for me(tm) )

Regards,

Jeroen Kleijer

On Tue, Sep 30, 2008 at 4:38 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> Sorry for the delay in replying -- I thought I had replied to this already,
> but I guess I hadn't. :-(
>
> We've talked about this feature several times, but this specific
> functionality hasn't made it into the OMPI code base yet. Sorry! :-(
>
> (patches would be gladly accepted, but note that we'll likely be kinda picky
> about this code because it's a little hairy and complex...)
>
>
> On Sep 19, 2008, at 7:00 PM, Jeroen Kleijer wrote:
>
>> Hi,
>>
>> I'm trying to get an openmpi application running accross different
>> nodes but seem to have hit a snag when the processes are on different
>> nodes, especially when the machines are on different TCP subnets.
>> The orted daemons start up fine but after that application borks with
>> the message
>>
>> [0,1,2][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect]
>> connect() failed with errno=111
>>
>> I've read in this thread
>>
>> http://thread.gmane.org/gmane.comp.clustering.open-mpi.user/3427/focus=3437
>> that openmpi currently can't do this yet but (pre-release?) versions
>> of openmpi 1.3 will work.
>> I've tried compiling openmpi 1.3a (nightly build) and running my
>> program with that (compiled with the mpicc of openmpi 1.3a) but I got
>> the same error message.
>>
>> Can anybody confirm that:
>> 1) openmpi has dificulties using the tcp btl accross different subnets
>> 2) there are currently no workarounds for this.
>>
>> If there are solutions to this I'd really like to know about it as
>> I've been trying this for quite a while now.
>>
>> Regards,
>>
>> Jeroen Kleijer
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>