Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Help diagnosing problem: not being able to run MPI code across computers
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-05-07 11:42:37


The list of names in the hostfile specifies the servers that will be used, not the network interfaces. Have a look at the TCP portion of the FAQ:

    http://www.open-mpi.org/faq/?category=tcp

On May 7, 2013, at 11:25 AM, Angel de Vicente <angelv_at_[hidden]> wrote:

> Hi again,
>
> Angel de Vicente <angelv_at_[hidden]> writes:
>> yes, that's just what I did with orted. I saw the port that it was
>> trying to connect and telnet to it, and I got "No route to host", so
>> that's why I was going the firewall path. Hopefully the sysadmins can
>> disable the firewall for the internal network today, and I can see if
>> that solves the issue.
>
> OK, removing the firewall for the private network improved things a
> lot.
>
> A simple "Hello World" seems to work without issues, but if I run my
> code, I have a problem like this:
>
> [angelv_at_comer RTI2D.Parallel]$ mpiexec -prefix $OMPI_PREFIX -hostfile
> $MPI_HOSTS -n 10 ../../../mancha2D_mpi_h5fc.x mancha.trol
>
> [...]
>
> [comer][[58110,1],0][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
> connect() to 161.72.206.3 failed: No route to host (113)
> [comer][[58110,1],1][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
> [comer][[58110,1],3][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
> connect() to 161.72.206.3 failed: No route to host (113)
> connect() to 161.72.206.3 failed: No route to host (113)
> [comer][[58110,1],1][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
> connect() to 161.72.206.3 failed: No route to host (113)
> [comer][[58110,1],2][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
> connect() to 161.72.206.3 failed: No route to host (113)
>
> But MPI_HOSTS points to a file with
> $ cat /net/nas7/polar/minicluster/machinefile-openmpi
> c0 slots=5
> c1 slots=5
> c2 slots=5
>
> c0, c1, and c2 are the names of the machines in the internal network,
> but for some reason it is using the public interfaces and complaining
> (the firewall in those is still active). I thought just specifying the
> names of the machines in the machinefile would make sure that we were
> using the right interface...
>
> Any help? Thanks,
> --
> Ángel de Vicente
> http://angel-de-vicente.blogspot.com/
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/