Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] tg3 module
From: Leonardo Fialho (lfialho_at_[hidden])
Date: 2008-06-04 11:27:17


Hi All,

I´m experimenting a strange problem. I don´t know if it was reported,
but, thats is:

when I run Open MPI in a specific cluster the network card module (tg3)
goes down... and in some minutes go up again. Of course its results in
"[nodo22][[56833,1],3][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed: No route to host (113)".

I run the same application in other cluster (with other network module)
and I get no errors.
I run the same application in the same cluster using MPICH and I get no
errors.

Kernel (on "dead" node) logs it:
nfs: server 192.168.65.100 not responding, still trying
NETDEV WATCHDOG: eth0: transmit timed out
tg3: eth0: transmit timed out, resetting
tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
tg3: eth0: Link is down.
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
nfs: server 192.168.65.100 OK

Anybody knows what is happening? I´m using Open MPI defaults, without
any -mca or -am parameters.

Some info:

[lfialho_at_aoclsp ~]$ /sbin/modinfo tg3
filename: /lib/modules/2.6.17-1.2142_FC4smp/kernel/drivers/net/tg3.ko
version: 3.59
license: GPL
description: Broadcom Tigon3 ethernet driver
author: David S. Miller (davem_at_[hidden]) and Jeff Garzik
(jgarzik_at_[hidden])
srcversion: CE9C9B036713CF38C2EE194
depends:
vermagic: 2.6.17-1.2142_FC4smp SMP mod_unload 686 REGPARM 4KSTACKS
gcc-4.0
parm: tg3_debug:Tigon3 bitmapped debugging message enable
value (int)
[lfialho_at_aoclsp ~]$

[lfialho_at_aoclsp ~]$ uname -a
Linux aoclsp.uab.es 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:57:02 EDT
2006 i686 i686 i386 GNU/Linux
[lfialho_at_aoclsp ~]$

[lfialho_at_aoclsp ~]$ gcc --version
gcc (GCC) 4.0.2 20051125 (Red Hat 4.0.2-8)
Copyright (C) 2005 Free Software Foundation, Inc.
[lfialho_at_aoclsp ~]$

[lfialho_at_aoclsp ~]$ /opt/radic-mpi/bin/ompi_info
                 Package: Open MPI lfialho_at_[hidden] Distribution
                Open MPI: 1.3a1-1
   Open MPI SVN revision: -1
                Open RTE: 1.3a1-1
   Open RTE SVN revision: -1
                    OPAL: 1.3a1-1
       OPAL SVN revision: -1
            Ident string: 1.3a1-1
                  Prefix: /opt/radic-mpi/
 Configured architecture: i686-pc-linux-gnu
          Configure host: aoclsp.uab.es
           Configured by: lfialho
           Configured on: Tue Jun 3 16:16:08 CEST 2008
          Configure host: aoclsp.uab.es
                Built by: lfialho
                Built on: mar jun 3 16:41:19 CEST 2008
              Built host: aoclsp.uab.es
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
 Fortran90 bindings size: small
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
      Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/bin/gfortran
      Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/bin/gfortran
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: no
          Thread support: posix (mpi: no, progress: no)
           Sparse Groups: no
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: no
         MPI I/O support: yes
       MPI_WTIME support: gettimeofday
Symbol visibility support: yes
   FT Checkpoint support: yes (checkpoint thread: no)
           MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.3)
              MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.3)
           MCA paffinity: linux (MCA v1.0, API v1.1, Component v1.3)
               MCA carto: auto_detect (MCA v1.0, API v1.0, Component v1.3)
               MCA carto: file (MCA v1.0, API v1.0, Component v1.3)
           MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.3)
               MCA timer: linux (MCA v1.0, API v1.0, Component v1.3)
         MCA installdirs: env (MCA v1.0, API v1.0, Component v1.3)
         MCA installdirs: config (MCA v1.0, API v1.0, Component v1.3)
                 MCA crs: blcr (MCA v1.0, API v1.0, Component v1.3)
                 MCA crs: self (MCA v1.0, API v1.0, Component v1.3)
                 MCA dpm: orte (MCA v1.0, API v1.0, Component v1.3)
              MCA pubsub: orte (MCA v1.0, API v1.0, Component v1.3)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.1, Component v1.3)
                MCA coll: inter (MCA v1.0, API v1.1, Component v1.3)
                MCA coll: self (MCA v1.0, API v1.1, Component v1.3)
                MCA coll: sm (MCA v1.0, API v1.1, Component v1.3)
                MCA coll: tuned (MCA v1.0, API v1.1, Component v1.3)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.3)
               MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.3)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.3)
                 MCA pml: cm (MCA v1.0, API v1.0, Component v1.3)
                 MCA pml: crcpw (MCA v1.0, API v1.0, Component v1.3)
                 MCA pml: dr (MCA v1.0, API v1.0, Component v1.3)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.3)
                 MCA pml: observer (MCA v1.0, API v1.0, Component v1.3)
                 MCA pml: v (MCA v1.0, API v1.0, Component v1.0)
                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.3)
              MCA rcache: vma (MCA v1.0, API v1.0, Component v1.3)
                 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.3)
                 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.3)
                 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.3)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.3)
                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.3)
                 MCA osc: rdma (MCA v1.0, API v1.0, Component v1.3)
                MCA crcp: coord (MCA v1.0, API v1.0, Component v1.3)
                MCA crcp: uncoord (MCA v1.0, API v1.0, Component v1.3)
              MCA errmgr: default (MCA v1.0, API v1.3, Component v1.3)
             MCA grpcomm: basic (MCA v1.0, API v2.0, Component v1.3)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.3)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.3)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                MCA odls: default (MCA v1.0, API v1.3, Component v1.3)
                 MCA ess: env (MCA v1.0, API v1.0, Component v1.3)
                 MCA ess: hnp (MCA v1.0, API v1.0, Component v1.3)
                 MCA ess: singleton (MCA v1.0, API v1.0, Component v1.3)
                 MCA ess: slurm (MCA v1.0, API v1.0, Component v1.3)
                 MCA ess: tool (MCA v1.0, API v1.0, Component v1.3)
                 MCA ras: gridengine (MCA v1.0, API v2.0, Component v1.3)
                 MCA ras: slurm (MCA v1.0, API v2.0, Component v1.3)
               MCA rmaps: rank_file (MCA v1.0, API v1.3, Component v1.3)
               MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.3)
               MCA rmaps: seq (MCA v1.0, API v1.3, Component v1.3)
                 MCA rml: ftrm (MCA v1.0, API v1.0, Component v1.3)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.3)
              MCA routed: tree (MCA v1.0, API v1.0, Component v1.3)
              MCA routed: unity (MCA v1.0, API v1.0, Component v1.3)
                 MCA plm: gridengine (MCA v1.0, API v1.0, Component v1.3)
                 MCA plm: rsh (MCA v1.0, API v1.0, Component v1.3)
                 MCA plm: slurm (MCA v1.0, API v1.0, Component v1.3)
               MCA snapc: full (MCA v1.0, API v1.0, Component v1.3)
               MCA snapc: single (MCA v1.0, API v1.0, Component v1.3)
               MCA filem: rsh (MCA v1.0, API v1.0, Component v1.3)
[lfialho_at_aoclsp ~]$

Thanks,

-- 
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478