Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Help diagnosing problem: not being able to run MPI code across computers
From: Angel de Vicente (angelv_at_[hidden])
Date: 2013-05-07 11:25:43


Hi again,

Angel de Vicente <angelv_at_[hidden]> writes:
> yes, that's just what I did with orted. I saw the port that it was
> trying to connect and telnet to it, and I got "No route to host", so
> that's why I was going the firewall path. Hopefully the sysadmins can
> disable the firewall for the internal network today, and I can see if
> that solves the issue.

OK, removing the firewall for the private network improved things a
lot.

A simple "Hello World" seems to work without issues, but if I run my
code, I have a problem like this:

[angelv_at_comer RTI2D.Parallel]$ mpiexec -prefix $OMPI_PREFIX -hostfile
$MPI_HOSTS -n 10 ../../../mancha2D_mpi_h5fc.x mancha.trol

[...]

[comer][[58110,1],0][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
connect() to 161.72.206.3 failed: No route to host (113)
[comer][[58110,1],1][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
[comer][[58110,1],3][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
connect() to 161.72.206.3 failed: No route to host (113)
connect() to 161.72.206.3 failed: No route to host (113)
[comer][[58110,1],1][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
connect() to 161.72.206.3 failed: No route to host (113)
[comer][[58110,1],2][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
connect() to 161.72.206.3 failed: No route to host (113)

But MPI_HOSTS points to a file with
$ cat /net/nas7/polar/minicluster/machinefile-openmpi
c0 slots=5
c1 slots=5
c2 slots=5

c0, c1, and c2 are the names of the machines in the internal network,
but for some reason it is using the public interfaces and complaining
(the firewall in those is still active). I thought just specifying the
names of the machines in the machinefile would make sure that we were
using the right interface...

Any help? Thanks,

-- 
Ángel de Vicente
http://angel-de-vicente.blogspot.com/