Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] tg3 module
From: Leonardo Fialho (lfialho_at_[hidden])
Date: 2008-06-04 11:27:17


Hi All,

I´m experimenting a strange problem. I don´t know if it was reported,
but, thats is:

when I run Open MPI in a specific cluster the network card module (tg3)
goes down... and in some minutes go up again. Of course its results in
"[nodo22][[56833,1],3][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed: No route to host (113)".

I run the same application in other cluster (with other network module)
and I get no errors.
I run the same application in the same cluster using MPICH and I get no
errors.

Kernel (on "dead" node) logs it:
nfs: server 192.168.65.100 not responding, still trying
NETDEV WATCHDOG: eth0: transmit timed out
tg3: eth0: transmit timed out, resetting
tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
tg3: eth0: Link is down.
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
nfs: server 192.168.65.100 OK

Anybody knows what is happening? I´m using Open MPI defaults, without
any -mca or -am parameters.

Some info:

[lfialho_at_aoclsp ~]$ /sbin/modinfo tg3
filename: /lib/modules/2.6.17-1.2142_FC4smp/kernel/drivers/net/tg3.ko
version: 3.59
license: GPL
description: Broadcom Tigon3 ethernet driver
author: David S. Miller (davem_at_[hidden]) and Jeff Garzik
(jgarzik_at_[hidden])
srcversion: CE9C9B036713CF38C2EE194
depends:
vermagic: 2.6.17-1.2142_FC4smp SMP mod_unload 686 REGPARM 4KSTACKS
gcc-4.0
parm: tg3_debug:Tigon3 bitmapped debugging message enable
value (int)
[lfialho_at_aoclsp ~]$

[lfialho_at_aoclsp ~]$ uname -a
Linux aoclsp.uab.es 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:57:02 EDT
2006 i686 i686 i386 GNU/Linux
[lfialho_at_aoclsp ~]$

[lfialho_at_aoclsp ~]$ gcc --version
gcc (GCC) 4.0.2 20051125 (Red Hat 4.0.2-8)
Copyright (C) 2005 Free Software Foundation, Inc.
[lfialho_at_aoclsp ~]$

[lfialho_at_aoclsp ~]$ /opt/radic-mpi/bin/ompi_info
                 Package: Open MPI lfialho_at_[hidden] Distribution
                Open MPI: 1.3a1-1
   Open MPI SVN revision: -1
                Open RTE: 1.3a1-1
   Open RTE SVN revision: -1
                    OPAL: 1.3a1-1
       OPAL SVN revision: -1
            Ident string: 1.3a1-1
                  Prefix: /opt/radic-mpi/
 Configured architecture: i686-pc-linux-gnu
          Configure host: aoclsp.uab.es
           Configured by: lfialho
           Configured on: Tue Jun 3 16:16:08 CEST 2008
          Configure host: aoclsp.uab.es
                Built by: lfialho
                Built on: mar jun 3 16:41:19 CEST 2008
              Built host: aoclsp.uab.es
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
 Fortran90 bindings size: small
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
      Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/bin/gfortran
      Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/bin/gfortran
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: no
          Thread support: posix (mpi: no, progress: no)
           Sparse Groups: no
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: no
         MPI I/O support: yes
       MPI_WTIME support: gettimeofday
Symbol visibility support: yes
   FT Checkpoint support: yes (checkpoint thread: no)
           MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.3)
              MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.3)
           MCA paffinity: linux (MCA v1.0, API v1.1, Component v1.3)
               MCA carto: auto_detect (MCA v1.0, API v1.0, Component v1.3)
               MCA carto: file (MCA v1.0, API v1.0, Component v1.3)
           MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.3)
               MCA timer: linux (MCA v1.0, API v1.0, Component v1.3)
         MCA installdirs: env (MCA v1.0, API v1.0, Component v1.3)
         MCA installdirs: config (MCA v1.0, API v1.0, Component v1.3)
                 MCA crs: blcr (MCA v1.0, API v1.0, Component v1.3)
                 MCA crs: self (MCA v1.0, API v1.0, Component v1.3)
                 MCA dpm: orte (MCA v1.0, API v1.0, Component v1.3)
              MCA pubsub: orte (MCA v1.0, API v1.0, Component v1.3)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.1, Component v1.3)
                MCA coll: inter (MCA v1.0, API v1.1, Component v1.3)
                MCA coll: self (MCA v1.0, API v1.1, Component v1.3)
                MCA coll: sm (MCA v1.0, API v1.1, Component v1.3)
                MCA coll: tuned (MCA v1.0, API v1.1, Component v1.3)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.3)
               MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.3)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.3)
                 MCA pml: cm (MCA v1.0, API v1.0, Component v1.3)
                 MCA pml: crcpw (MCA v1.0, API v1.0, Component v1.3)
                 MCA pml: dr (MCA v1.0, API v1.0, Component v1.3)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.3)
                 MCA pml: observer (MCA v1.0, API v1.0, Component v1.3)
                 MCA pml: v (MCA v1.0, API v1.0, Component v1.0)
                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.3)
              MCA rcache: vma (MCA v1.0, API v1.0, Component v1.3)
                 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.3)
                 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.3)
                 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.3)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.3)
                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.3)
                 MCA osc: rdma (MCA v1.0, API v1.0, Component v1.3)
                MCA crcp: coord (MCA v1.0, API v1.0, Component v1.3)
                MCA crcp: uncoord (MCA v1.0, API v1.0, Component v1.3)
              MCA errmgr: default (MCA v1.0, API v1.3, Component v1.3)
             MCA grpcomm: basic (MCA v1.0, API v2.0, Component v1.3)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.3)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.3)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                MCA odls: default (MCA v1.0, API v1.3, Component v1.3)
                 MCA ess: env (MCA v1.0, API v1.0, Component v1.3)
                 MCA ess: hnp (MCA v1.0, API v1.0, Component v1.3)
                 MCA ess: singleton (MCA v1.0, API v1.0, Component v1.3)
                 MCA ess: slurm (MCA v1.0, API v1.0, Component v1.3)
                 MCA ess: tool (MCA v1.0, API v1.0, Component v1.3)
                 MCA ras: gridengine (MCA v1.0, API v2.0, Component v1.3)
                 MCA ras: slurm (MCA v1.0, API v2.0, Component v1.3)
               MCA rmaps: rank_file (MCA v1.0, API v1.3, Component v1.3)
               MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.3)
               MCA rmaps: seq (MCA v1.0, API v1.3, Component v1.3)
                 MCA rml: ftrm (MCA v1.0, API v1.0, Component v1.3)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.3)
              MCA routed: tree (MCA v1.0, API v1.0, Component v1.3)
              MCA routed: unity (MCA v1.0, API v1.0, Component v1.3)
                 MCA plm: gridengine (MCA v1.0, API v1.0, Component v1.3)
                 MCA plm: rsh (MCA v1.0, API v1.0, Component v1.3)
                 MCA plm: slurm (MCA v1.0, API v1.0, Component v1.3)
               MCA snapc: full (MCA v1.0, API v1.0, Component v1.3)
               MCA snapc: single (MCA v1.0, API v1.0, Component v1.3)
               MCA filem: rsh (MCA v1.0, API v1.0, Component v1.3)
[lfialho_at_aoclsp ~]$

Thanks,

-- 
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478