Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Error message related to infiniband
From: Syed Ahsan Ali (ahsanshah01_at_[hidden])
Date: 2014-01-19 22:18:41


I agree with you and still struglling with subnet ID settings because I
couldn't find /var/cache/opensm/opensm.opts file.

Secondly, if OMPI is going for TCP then it should be able to find as
compute nodes are available via ping and ssh

On Sun, Jan 19, 2014 at 9:38 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> If OMPI finds infiniband support on the node, it will attempt to use it.
> In this case, it would appear you have an incorrectly configured IB adaptor
> on the node, so you get the additional warning about that fact.
>
> OMPI then falls back to look for another transport, in this case TCP.
> However, the TCP transport is unable to create a socket to the remote host.
> The most likely cause is a firewall, so you might want to check that and
> turn it off.
>
>
> On Jan 19, 2014, at 4:19 AM, Syed Ahsan Ali <ahsanshah01_at_[hidden]> wrote:
>
> Dear All
>
> I am getting infiniband errors while running mpirun applications on
> cluster. I get these errors even when I don't include infiniband usage
> flags in mpirun command. Please guide
>
> mpirun -np 72 -hostfile hostlist ../bin/regcmMPI regcm.in
>
> --------------------------------------------------------------------------
> [[59183,1],24]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
>
> Module: OpenFabrics (openib)
> Host: compute-01-10.private.dns.zone
>
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> WARNING: There are more than one active ports on host
> 'compute-01-15.private.dns.zone', but the
> default subnet GID prefix was detected on more than one of these
> ports. If these ports are connected to different physical IB
> networks, this configuration will fail in Open MPI. This version of
> Open MPI requires that every physically separate IB subnet that is
> used between connected MPI processes must have different subnet ID
> values.
>
> Please see this FAQ entry for more details:
>
> http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid
>
> NOTE: You can turn off this warning by setting the MCA parameter
> btl_openib_warn_default_gid_prefix to 0.
> --------------------------------------------------------------------------
>
> This is RegCM trunk
> SVN Revision: tag 4.3.5.6 compiled at: data : Sep 3 2013 time:
> 05:10:53
>
> [pmd.pakmet.com:03309] 15 more processes have sent help message
> help-mpi-btl-base.txt / btl:no-nics
> [pmd.pakmet.com:03309] Set MCA parameter "orte_base_help_aggregate" to 0
> to see all help / error messages
> [pmd.pakmet.com:03309] 47 more processes have sent help message
> help-mpi-btl-openib.txt / default subnet prefix
> [compute-01-03.private.dns.zone][[59183,1],1][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.108.10 failed: No route to host (113)
> [compute-01-03.private.dns.zone][[59183,1],2][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.108.10 failed: No route to host (113)
> [compute-01-03.private.dns.zone][[59183,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.108.10 failed: No route to host (113)
> [compute-01-03.private.dns.zone][[59183,1],3][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
> [compute-01-03.private.dns.zone][[59183,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.108.10 failed: No route to host (113)
> [compute-01-03.private.dns.zone][[59183,1],7][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.108.10 failed: No route to host (113)
> connect() to 192.168.108.10 failed: No route to host (113)
> [compute-01-03.private.dns.zone][[59183,1],6][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.108.10 failed: No route to host (113)
> [compute-01-03.private.dns.zone][[59183,1],4][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.108.10 failed: No route to host (113)
>
> Ahsan
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Syed Ahsan Ali Bokhari
Electronic Engineer (EE)
Research & Development Division
Pakistan Meteorological Department H-8/4, Islamabad.
Phone # off  +92518358714
Cell # +923155145014