Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Error message related to infiniband
From: Syed Ahsan Ali (ahsanshah01_at_[hidden])
Date: 2014-01-19 07:19:04


Dear All

I am getting infiniband errors while running mpirun applications on
cluster. I get these errors even when I don't include infiniband usage
flags in mpirun command. Please guide

mpirun -np 72 -hostfile hostlist ../bin/regcmMPI regcm.in

--------------------------------------------------------------------------
[[59183,1],24]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: compute-01-10.private.dns.zone

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There are more than one active ports on host
'compute-01-15.private.dns.zone', but the
default subnet GID prefix was detected on more than one of these
ports. If these ports are connected to different physical IB
networks, this configuration will fail in Open MPI. This version of
Open MPI requires that every physically separate IB subnet that is
used between connected MPI processes must have different subnet ID
values.

Please see this FAQ entry for more details:

  http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_default_gid_prefix to 0.
--------------------------------------------------------------------------

  This is RegCM trunk
   SVN Revision: tag 4.3.5.6 compiled at: data : Sep 3 2013 time: 05:10:53

[pmd.pakmet.com:03309] 15 more processes have sent help message
help-mpi-btl-base.txt / btl:no-nics
[pmd.pakmet.com:03309] Set MCA parameter "orte_base_help_aggregate" to 0 to
see all help / error messages
[pmd.pakmet.com:03309] 47 more processes have sent help message
help-mpi-btl-openib.txt / default subnet prefix
[compute-01-03.private.dns.zone][[59183,1],1][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.108.10 failed: No route to host (113)
[compute-01-03.private.dns.zone][[59183,1],2][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.108.10 failed: No route to host (113)
[compute-01-03.private.dns.zone][[59183,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.108.10 failed: No route to host (113)
[compute-01-03.private.dns.zone][[59183,1],3][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
[compute-01-03.private.dns.zone][[59183,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.108.10 failed: No route to host (113)
[compute-01-03.private.dns.zone][[59183,1],7][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.108.10 failed: No route to host (113)
connect() to 192.168.108.10 failed: No route to host (113)
[compute-01-03.private.dns.zone][[59183,1],6][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.108.10 failed: No route to host (113)
[compute-01-03.private.dns.zone][[59183,1],4][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.108.10 failed: No route to host (113)

Ahsan