Sorry for the delay in replying -- I thought I had replied to this
already, but I guess I hadn't. :-(
We've talked about this feature several times, but this specific
functionality hasn't made it into the OMPI code base yet. Sorry! :-(
(patches would be gladly accepted, but note that we'll likely be kinda
picky about this code because it's a little hairy and complex...)
On Sep 19, 2008, at 7:00 PM, Jeroen Kleijer wrote:
> Hi,
>
> I'm trying to get an openmpi application running accross different
> nodes but seem to have hit a snag when the processes are on different
> nodes, especially when the machines are on different TCP subnets.
> The orted daemons start up fine but after that application borks with
> the message
>
> [0,1,2][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect]
> connect() failed with errno=111
>
> I've read in this thread
> http://thread.gmane.org/gmane.comp.clustering.open-mpi.user/3427/focus=3437
> that openmpi currently can't do this yet but (pre-release?) versions
> of openmpi 1.3 will work.
> I've tried compiling openmpi 1.3a (nightly build) and running my
> program with that (compiled with the mpicc of openmpi 1.3a) but I got
> the same error message.
>
> Can anybody confirm that:
> 1) openmpi has dificulties using the tcp btl accross different subnets
> 2) there are currently no workarounds for this.
>
> If there are solutions to this I'd really like to know about it as
> I've been trying this for quite a while now.
>
> Regards,
>
> Jeroen Kleijer
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems
|