Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Borenstein, Bernard S (bernard.s.borenstein_at_[hidden])
Date: 2006-07-03 12:58:45


I've built and sucessfully run the Nasa Overflow 2.0aa program with
Openmpi 1.0.2. I'm running on an opteron linux cluster running SLES 9
and GM 2.0.24. I built Openmpi 1.1 with the intel 9 compilers and try to
run Overflow 2.0aa with myrinet, it get what looks like a data
corruption error and the program dies quickly.
There are no mpi errors at all.If I run using GIGE (--mca btl self,tcp),
the program runs to competion correctly. Here is my ompi_info output :

bsb3227_at_mahler:~/openmpi_1.1/bin> ./ompi_info
                Open MPI: 1.1
   Open MPI SVN revision: r10477
                Open RTE: 1.1
   Open RTE SVN revision: r10477
                    OPAL: 1.1
       OPAL SVN revision: r10477
                  Prefix: /home/bsb3227/openmpi_1.1
 Configured architecture: x86_64-unknown-linux-gnu
           Configured by: bsb3227
           Configured on: Fri Jun 30 07:08:54 PDT 2006
          Configure host: mahler
                Built by: bsb3227
                Built on: Fri Jun 30 07:54:46 PDT 2006
              Built host: mahler
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
 Fortran90 bindings size: small
              C compiler: icc
     C compiler absolute: /opt/intel/cce/9.0.25/bin/icc
            C++ compiler: icpc
   C++ compiler absolute: /opt/intel/cce/9.0.25/bin/icpc
      Fortran77 compiler: ifort
  Fortran77 compiler abs: /opt/intel/fce/9.0.25/bin/ifort
      Fortran90 compiler: /opt/intel/fce/9.0.25/bin/ifort
  Fortran90 compiler abs: /opt/intel/fce/9.0.25/bin/ifort
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: no
          Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
              MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1)
           MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1)
           MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1)
           MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.1)
               MCA timer: linux (MCA v1.0, API v1.0, Component v1.1)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.0, Component v1.1)
                MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1)
                MCA coll: self (MCA v1.0, API v1.0, Component v1.1)
                MCA coll: sm (MCA v1.0, API v1.0, Component v1.1)
                MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.1)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1)
               MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1)
                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1)
              MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1)
                 MCA btl: self (MCA v1.0, API v1.0, Component v1.1)
                 MCA btl: sm (MCA v1.0, API v1.0, Component v1.1)
                 MCA btl: gm (MCA v1.0, API v1.0, Component v1.1)
                 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.1)
                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.1)
                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1)
                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.1)
                  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1)
                  MCA ns: replica (MCA v1.0, API v1.0, Component v1.1)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1)
                 MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1)
                 MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1)
                 MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1)
                 MCA ras: tm (MCA v1.0, API v1.0, Component v1.1)
                 MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1)
                 MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1)
               MCA rmaps: round_robin (MCA v1.0, API v1.0, Component
v1.1)
                MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1)
                MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.1)
                 MCA pls: fork (MCA v1.0, API v1.0, Component v1.1)
                 MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1)
                 MCA pls: slurm (MCA v1.0, API v1.0, Component v1.1)
                 MCA pls: tm (MCA v1.0, API v1.0, Component v1.1)
                 MCA sds: env (MCA v1.0, API v1.0, Component v1.1)
                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.1)
                 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1)
                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1)
                 MCA sds: slurm (MCA v1.0, API v1.0, Component v1.1)

Here is the ifconfig for one of the nodes :

bsb3227_at_m045:~> /sbin/ifconfig
eth0 Link encap:Ethernet HWaddr 00:50:45:5D:CD:FE
          inet addr:10.241.194.45 Bcast:10.241.195.255
Mask:255.255.254.0
          inet6 addr: fe80::250:45ff:fe5d:cdfe/64 Scope:Link
          UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:39913407 errors:0 dropped:0 overruns:0 frame:0
          TX packets:48794587 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:31847343907 (30371.9 Mb) TX bytes:48231713866
(45997.3 Mb)
          Interrupt:19

eth1 Link encap:Ethernet HWaddr 00:50:45:5D:CD:FF
          inet6 addr: fe80::250:45ff:fe5d:cdff/64 Scope:Link
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
          Interrupt:19

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:23141 errors:0 dropped:0 overruns:0 frame:0
          TX packets:23141 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:20145689 (19.2 Mb) TX bytes:20145689 (19.2 Mb)

I hope someone can give me some guidance on how to debug this problem.
Thanx in advance for any help
that can be provided.

Bernie Borenstein
The Boeing Company