Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Galen M. Shipman (gshipman_at_[hidden])
Date: 2006-07-06 11:28:27


Hey Justin,

Please provide us your mca parameters (if any), these could be in a
config file, environment variables or on the command line.

Thanks,

Galen

On Jul 6, 2006, at 9:22 AM, Justin Bronder wrote:

> As far as the nightly builds go, I'm still seeing what I believe to be
> this problem in both r10670 and r10652. This is happening with
> both Linux and OS X. Below are the systems and ompi_info for the
> newest revision 10670.
>
> As an example of the error, when running HPL with Myrinet I get the
> following error. Using tcp everything is fine and I see the
> results I'd
> expect.
> ----------------------------------------------------------------------
> ------
> ||Ax-b||_oo / ( eps * ||A||_1 * N ) =
> 42820214496954887558164928727596662784.0000000 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) =
> 156556068835.2711182 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =
> 1156439380.5172558 ...... FAILED
> ||Ax-b||_oo . . . . . . . . . . . . . . . . . =
> 272683853978565028754868928512.000000
> ||A||_oo . . . . . . . . . . . . . . . . . . . = 3822.884181
> ||A||_1 . . . . . . . . . . . . . . . . . . . = 3823.922627
> ||x||_oo . . . . . . . . . . . . . . . . . . . =
> 37037692483529688659798261760.000000
> ||x||_1 . . . . . . . . . . . . . . . . . . . =
> 4102704048669982798475494948864.000000
> ===================================================
>
> Finished 1 tests with the following results:
> 0 tests completed and passed residual checks,
> 1 tests completed and failed residual checks,
> 0 tests skipped because of illegal input values.
> ----------------------------------------------------------------------
> ------
>
> Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01 EDT 2006 ppc64
> PPC970FX, altivec supported GNU/Linux
> jbronder_at_node41 ~ $ /usr/local/ompi-gnu-1.1.1a/bin/ompi_info
> Open MPI: 1.1.1a1r10670
> Open MPI SVN revision: r10670
> Open RTE: 1.1.1a1r10670
> Open RTE SVN revision: r10670
> OPAL: 1.1.1a1r10670
> OPAL SVN revision: r10670
> Prefix: /usr/local/ompi-gnu-1.1.1a
> Configured architecture: powerpc64-unknown-linux-gnu
> Configured by: root
> Configured on: Thu Jul 6 10:15:37 EDT 2006
> Configure host: node41
> Built by: root
> Built on: Thu Jul 6 10:28:14 EDT 2006
> Built host: node41
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (all)
> Fortran90 bindings: yes
> Fortran90 bindings size: small
> C compiler: gcc
> C compiler absolute: /usr/bin/gcc
> C++ compiler: g++
> C++ compiler absolute: /usr/bin/g++
> Fortran77 compiler: gfortran
> Fortran77 compiler abs: /usr/powerpc64-unknown-linux-gnu/gcc-bin/
> 4.1.0/gfortran
> Fortran90 compiler: gfortran
> Fortran90 compiler abs: /usr/powerpc64-unknown-linux-gnu/gcc-bin/
> 4.1.0/gfortran
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: no
> Thread support: posix (mpi: no, progress: no)
> Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.1)
> MCA maffinity: first_use (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.1)
> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.1)
> MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1)
> MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1)
> MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.1)
> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.1)
> MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.1)
> MCA btl: gm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA btl: self (MCA v1.0, API v1.0, Component v1.1.1)
> MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.1)
> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
> MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.1)
> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.1)
> MCA gpr: replica (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.1)
> MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.1)
> MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1.1)
> MCA ns: replica (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA ras: dash_host (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA ras: hostfile (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA ras: localhost (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA ras: tm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA rds: hostfile (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA rds: resfile (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA rmaps: round_robin (MCA v1.0, API v1.0,
> Component v1.1.1)
> MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1.1)
> MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA rml: oob (MCA v1.0, API v1.0, Component v1.1.1)
> MCA pls: fork (MCA v1.0, API v1.0, Component v1.1.1)
> MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1.1)
> MCA pls: tm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA sds: env (MCA v1.0, API v1.0, Component v1.1.1)
> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1.1)
> MCA sds: seed (MCA v1.0, API v1.0, Component v1.1.1)
> MCA sds: singleton (MCA v1.0, API v1.0, Component
> v1.1.1)
> Configured as:
> ./configure \
> --prefix=$PREFIX \
> --enable-mpi-f77 \
> --enable-mpi-f90 \
> --enable-mpi-profile \
> --enable-mpi-cxx \
> --enable-pty-support \
> --enable-shared \
> --enable-smp-locks \
> --enable-io-romio \
> --with-tm=/usr/local/pbs \
> --without-xgrid \
> --without-slurm \
> --with-gm=/opt/gm
>
> Darwin node90.meldrew.clusters.umaine.edu 8.6.0 Darwin Kernel
> Version 8.6.0: Tue Mar 7 16:58:48 PST 2006;
> root:xnu-792.6.70.obj~1/RELEASE_PPC Power Macintosh powerpc
> node90:~/src/hpl jbronder$ /usr/local/ompi-xl/bin/ompi_info
> Open MPI: 1.1.1a1r10670
> Open MPI SVN revision: r10670
> Open RTE: 1.1.1a1r10670
> Open RTE SVN revision: r10670
> OPAL: 1.1.1a1r10670
> OPAL SVN revision: r10670
> Prefix: /usr/local/ompi-xl
> Configured architecture: powerpc-apple-darwin8.6.0
> Configured by:
> Configured on: Thu Jul 6 10:05:20 EDT 2006
> Configure host: node90.meldrew.clusters.umaine.edu
> Built by: root
> Built on: Thu Jul 6 10:37:40 EDT 2006
> Built host: node90.meldrew.clusters.umaine.edu
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (lower case)
> Fortran90 bindings: yes
> Fortran90 bindings size: small
> C compiler: /opt/ibmcmp/vac/6.0/bin/xlc
> C compiler absolute: /opt/ibmcmp/vac/6.0/bin/xlc
> C++ compiler: /opt/ibmcmp/vacpp/6.0/bin/xlc++
> C++ compiler absolute: /opt/ibmcmp/vacpp/6.0/bin/xlc++
> Fortran77 compiler: /opt/ibmcmp/xlf/8.1/bin/xlf_r
> Fortran77 compiler abs: /opt/ibmcmp/xlf/8.1/bin/xlf_r
> Fortran90 compiler: /opt/ibmcmp/xlf/8.1/bin/xlf90_r
> Fortran90 compiler abs: /opt/ibmcmp/xlf/8.1/bin/xlf90_r
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: no
> Thread support: posix (mpi: no, progress: no)
> Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> MCA memory: darwin (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA maffinity: first_use (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA timer: darwin (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.1)
> MCA coll: hierarch (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1)
> MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1)
> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.1)
> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.1)
> MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.1)
> MCA btl: self (MCA v1.0, API v1.0, Component v1.1.1)
> MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA btl: gm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.1)
> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
> MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.1)
> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.1)
> MCA gpr: replica (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.1)
> MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.1)
> MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1.1)
> MCA ns: replica (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA ras: dash_host (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA ras: hostfile (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA ras: localhost (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA ras: tm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA rds: hostfile (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA rds: resfile (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA rmaps: round_robin (MCA v1.0, API v1.0,
> Component v1.1.1)
> MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1.1)
> MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA rml: oob (MCA v1.0, API v1.0, Component v1.1.1)
> MCA pls: fork (MCA v1.0, API v1.0, Component v1.1.1)
> MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1.1)
> MCA pls: tm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA sds: env (MCA v1.0, API v1.0, Component v1.1.1)
> MCA sds: seed (MCA v1.0, API v1.0, Component v1.1.1)
> MCA sds: singleton (MCA v1.0, API v1.0, Component
> v1.1.1)
> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1.1)
> Configured as:
> ./configure \
> --prefix=$PREFIX \
> --with-tm=/usr/local/pbs/ \
> --with-gm=/opt/gm \
> --enable-static \
> --disable-cxx
> On 7/3/06, George Bosilca <bosilca_at_[hidden]> wrote:
> Bernard,
>
> A bug in the Open MPI GM driver was discovered after the 1.1 release.
> A patch for the 1.1 is on the way. However, I don't know if it will
> be available before the 1.1.1. Meanwhile, you can use the nightly
> build version or a fresh check-out from the SVN repository. Both of
> them have the GM bug corrected.
>
> Sorry for the troubles,
> george.
>
> On Jul 3, 2006, at 12:58 PM, Borenstein, Bernard S wrote:
>
> > I've built and sucessfully run the Nasa Overflow 2.0aa program with
> > Openmpi 1.0.2. I'm running on an opteron linux cluster running
> SLES 9
> > and GM 2.0.24. I built Openmpi 1.1 with the intel 9 compilers and
> > try to
> > run Overflow 2.0aa with myrinet, it get what looks like a data
> > corruption error and the program dies quickly.
> > There are no mpi errors at all.If I run using GIGE (--mca btl
> > self,tcp),
> > the program runs to competion correctly. Here is my ompi_info
> > output :
> >
> > bsb3227_at_mahler:~/openmpi_1.1/bin> ./ompi_info
> > Open MPI: 1.1
> > Open MPI SVN revision: r10477
> > Open RTE: 1.1
> > Open RTE SVN revision: r10477
> > OPAL: 1.1
> > OPAL SVN revision: r10477
> > Prefix: /home/bsb3227/openmpi_1.1
> > Configured architecture: x86_64-unknown-linux-gnu
> > Configured by: bsb3227
> > Configured on: Fri Jun 30 07:08:54 PDT 2006
> > Configure host: mahler
> > Built by: bsb3227
> > Built on: Fri Jun 30 07:54:46 PDT 2006
> > Built host: mahler
> > C bindings: yes
> > C++ bindings: yes
> > Fortran77 bindings: yes (all)
> > Fortran90 bindings: yes
> > Fortran90 bindings size: small
> > C compiler: icc
> > C compiler absolute: /opt/intel/cce/9.0.25/bin/icc
> > C++ compiler: icpc
> > C++ compiler absolute: /opt/intel/cce/9.0.25/bin/icpc
> > Fortran77 compiler: ifort
> > Fortran77 compiler abs: /opt/intel/fce/9.0.25/bin/ifort
> > Fortran90 compiler: /opt/intel/fce/9.0.25/bin/ifort
> > Fortran90 compiler abs: /opt/intel/fce/9.0.25/bin/ifort
> > C profiling: yes
> > C++ profiling: yes
> > Fortran77 profiling: yes
> > Fortran90 profiling: yes
> > C++ exceptions: no
> > Thread support: posix (mpi: no, progress: no)
> > Internal debug support: no
> > MPI parameter check: runtime
> > Memory profiling support: no
> > Memory debugging support: no
> > libltdl support: yes
> > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
> > v1.1)
> > MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1)
> > MCA maffinity: first_use (MCA v1.0, API v1.0, Component
> > v1.1)
> > MCA maffinity: libnuma (MCA v1.0, API v1.0, Component
> v1.1)
> > MCA timer: linux (MCA v1.0, API v1.0, Component v1.1)
> > MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> > MCA allocator: bucket (MCA v1.0, API v1.0, Component
> v1.0)
> > MCA coll: basic (MCA v1.0, API v1.0, Component v1.1)
> > MCA coll: hierarch (MCA v1.0, API v1.0, Component
> > v1.1)
> > MCA coll: self (MCA v1.0, API v1.0, Component v1.1)
> > MCA coll: sm (MCA v1.0, API v1.0, Component v1.1)
> > MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1)
> > MCA io: romio (MCA v1.0, API v1.0, Component v1.1)
> > MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1)
> > MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1)
> > MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1)
> > MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1)
> > MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1)
> > MCA btl: self (MCA v1.0, API v1.0, Component v1.1)
> > MCA btl: sm (MCA v1.0, API v1.0, Component v1.1)
> > MCA btl: gm (MCA v1.0, API v1.0, Component v1.1)
> > MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
> > MCA topo: unity (MCA v1.0, API v1.0, Component v1.1)
> > MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
> > MCA gpr: null (MCA v1.0, API v1.0, Component v1.1)
> > MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1)
> > MCA gpr: replica (MCA v1.0, API v1.0, Component
> v1.1)
> > MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1)
> > MCA iof: svc (MCA v1.0, API v1.0, Component v1.1)
> > MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1)
> > MCA ns: replica (MCA v1.0, API v1.0, Component
> v1.1)
> > MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> > MCA ras: dash_host (MCA v1.0, API v1.0, Component
> > v1.1)
> > MCA ras: hostfile (MCA v1.0, API v1.0, Component
> > v1.1)
> > MCA ras: localhost (MCA v1.0, API v1.0, Component
> > v1.1)
> > MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1)
> > MCA ras: tm (MCA v1.0, API v1.0, Component v1.1)
> > MCA rds: hostfile (MCA v1.0, API v1.0, Component
> > v1.1)
> > MCA rds: resfile (MCA v1.0, API v1.0, Component
> v1.1)
> > MCA rmaps: round_robin (MCA v1.0, API v1.0, Component
> > v1.1)
> > MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1)
> > MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1)
> > MCA rml: oob (MCA v1.0, API v1.0, Component v1.1)
> > MCA pls: fork (MCA v1.0, API v1.0, Component v1.1)
> > MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1)
> > MCA pls: slurm (MCA v1.0, API v1.0, Component v1.1)
> > MCA pls: tm (MCA v1.0, API v1.0, Component v1.1)
> > MCA sds: env (MCA v1.0, API v1.0, Component v1.1)
> > MCA sds: seed (MCA v1.0, API v1.0, Component v1.1)
> > MCA sds: singleton (MCA v1.0, API v1.0, Component
> > v1.1)
> > MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1)
> > MCA sds: slurm (MCA v1.0, API v1.0, Component v1.1)
> >
> > Here is the ifconfig for one of the nodes :
> >
> > bsb3227_at_m045:~> /sbin/ifconfig
> > eth0 Link encap:Ethernet HWaddr 00:50:45:5D:CD:FE
> > inet addr:10.241.194.45 Bcast:10.241.195.255
> > Mask:255.255.254.0
> > inet6 addr: fe80::250:45ff:fe5d:cdfe/64 Scope:Link
> > UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500
> > Metric:1
> > RX packets:39913407 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:48794587 errors:0 dropped:0 overruns:0
> carrier:0
> > collisions:0 txqueuelen:1000
> > RX bytes:31847343907 (30371.9 Mb) TX bytes:48231713866
> > (45997.3 Mb)
> > Interrupt:19
> >
> > eth1 Link encap:Ethernet HWaddr 00:50:45:5D:CD:FF
> > inet6 addr: fe80::250:45ff:fe5d:cdff/64 Scope:Link
> > UP BROADCAST MULTICAST MTU:1500 Metric:1
> > RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:1000
> > RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
> > Interrupt:19
> >
> > lo Link encap:Local Loopback
> > inet addr:127.0.0.1 Mask: 255.0.0.0
> > inet6 addr: ::1/128 Scope:Host
> > UP LOOPBACK RUNNING MTU:16436 Metric:1
> > RX packets:23141 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:23141 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:0
> > RX bytes:20145689 (19.2 Mb) TX bytes:20145689 (19.2 Mb)
> >
> > I hope someone can give me some guidance on how to debug this
> problem.
> > Thanx in advance for any help
> > that can be provided.
> >
> > Bernie Borenstein
> > The Boeing Company
> > <config.log.gz>
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> "Half of what I say is meaningless; but I say it so that the other
> half may reach you"
> Kahlil Gibran
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users