I'm trying to get an openmpi application running accross different
nodes but seem to have hit a snag when the processes are on different
nodes, especially when the machines are on different TCP subnets.
The orted daemons start up fine but after that application borks with
connect() failed with errno=111
I've read in this thread
that openmpi currently can't do this yet but (pre-release?) versions
of openmpi 1.3 will work.
I've tried compiling openmpi 1.3a (nightly build) and running my
program with that (compiled with the mpicc of openmpi 1.3a) but I got
the same error message.
Can anybody confirm that:
1) openmpi has dificulties using the tcp btl accross different subnets
2) there are currently no workarounds for this.
If there are solutions to this I'd really like to know about it as
I've been trying this for quite a while now.