Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2006-07-03 14:31:28


Bernard,

A bug in the Open MPI GM driver was discovered after the 1.1 release.
A patch for the 1.1 is on the way. However, I don't know if it will
be available before the 1.1.1. Meanwhile, you can use the nightly
build version or a fresh check-out from the SVN repository. Both of
them have the GM bug corrected.

   Sorry for the troubles,
     george.

On Jul 3, 2006, at 12:58 PM, Borenstein, Bernard S wrote:

> I've built and sucessfully run the Nasa Overflow 2.0aa program with
> Openmpi 1.0.2. I'm running on an opteron linux cluster running SLES 9
> and GM 2.0.24. I built Openmpi 1.1 with the intel 9 compilers and
> try to
> run Overflow 2.0aa with myrinet, it get what looks like a data
> corruption error and the program dies quickly.
> There are no mpi errors at all.If I run using GIGE (--mca btl
> self,tcp),
> the program runs to competion correctly. Here is my ompi_info
> output :
>
> bsb3227_at_mahler:~/openmpi_1.1/bin> ./ompi_info
> Open MPI: 1.1
> Open MPI SVN revision: r10477
> Open RTE: 1.1
> Open RTE SVN revision: r10477
> OPAL: 1.1
> OPAL SVN revision: r10477
> Prefix: /home/bsb3227/openmpi_1.1
> Configured architecture: x86_64-unknown-linux-gnu
> Configured by: bsb3227
> Configured on: Fri Jun 30 07:08:54 PDT 2006
> Configure host: mahler
> Built by: bsb3227
> Built on: Fri Jun 30 07:54:46 PDT 2006
> Built host: mahler
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (all)
> Fortran90 bindings: yes
> Fortran90 bindings size: small
> C compiler: icc
> C compiler absolute: /opt/intel/cce/9.0.25/bin/icc
> C++ compiler: icpc
> C++ compiler absolute: /opt/intel/cce/9.0.25/bin/icpc
> Fortran77 compiler: ifort
> Fortran77 compiler abs: /opt/intel/fce/9.0.25/bin/ifort
> Fortran90 compiler: /opt/intel/fce/9.0.25/bin/ifort
> Fortran90 compiler abs: /opt/intel/fce/9.0.25/bin/ifort
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: no
> Thread support: posix (mpi: no, progress: no)
> Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
> v1.1)
> MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1)
> MCA maffinity: first_use (MCA v1.0, API v1.0, Component
> v1.1)
> MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.1)
> MCA timer: linux (MCA v1.0, API v1.0, Component v1.1)
> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Component v1.1)
> MCA coll: hierarch (MCA v1.0, API v1.0, Component
> v1.1)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.1)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.1)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1)
> MCA io: romio (MCA v1.0, API v1.0, Component v1.1)
> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1)
> MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1)
> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1)
> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1)
> MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1)
> MCA btl: self (MCA v1.0, API v1.0, Component v1.1)
> MCA btl: sm (MCA v1.0, API v1.0, Component v1.1)
> MCA btl: gm (MCA v1.0, API v1.0, Component v1.1)
> MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA topo: unity (MCA v1.0, API v1.0, Component v1.1)
> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
> MCA gpr: null (MCA v1.0, API v1.0, Component v1.1)
> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1)
> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1)
> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1)
> MCA iof: svc (MCA v1.0, API v1.0, Component v1.1)
> MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1)
> MCA ns: replica (MCA v1.0, API v1.0, Component v1.1)
> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA ras: dash_host (MCA v1.0, API v1.0, Component
> v1.1)
> MCA ras: hostfile (MCA v1.0, API v1.0, Component
> v1.1)
> MCA ras: localhost (MCA v1.0, API v1.0, Component
> v1.1)
> MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1)
> MCA ras: tm (MCA v1.0, API v1.0, Component v1.1)
> MCA rds: hostfile (MCA v1.0, API v1.0, Component
> v1.1)
> MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1)
> MCA rmaps: round_robin (MCA v1.0, API v1.0, Component
> v1.1)
> MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1)
> MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1)
> MCA rml: oob (MCA v1.0, API v1.0, Component v1.1)
> MCA pls: fork (MCA v1.0, API v1.0, Component v1.1)
> MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1)
> MCA pls: slurm (MCA v1.0, API v1.0, Component v1.1)
> MCA pls: tm (MCA v1.0, API v1.0, Component v1.1)
> MCA sds: env (MCA v1.0, API v1.0, Component v1.1)
> MCA sds: seed (MCA v1.0, API v1.0, Component v1.1)
> MCA sds: singleton (MCA v1.0, API v1.0, Component
> v1.1)
> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1)
> MCA sds: slurm (MCA v1.0, API v1.0, Component v1.1)
>
> Here is the ifconfig for one of the nodes :
>
> bsb3227_at_m045:~> /sbin/ifconfig
> eth0 Link encap:Ethernet HWaddr 00:50:45:5D:CD:FE
> inet addr:10.241.194.45 Bcast:10.241.195.255
> Mask:255.255.254.0
> inet6 addr: fe80::250:45ff:fe5d:cdfe/64 Scope:Link
> UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500
> Metric:1
> RX packets:39913407 errors:0 dropped:0 overruns:0 frame:0
> TX packets:48794587 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:31847343907 (30371.9 Mb) TX bytes:48231713866
> (45997.3 Mb)
> Interrupt:19
>
> eth1 Link encap:Ethernet HWaddr 00:50:45:5D:CD:FF
> inet6 addr: fe80::250:45ff:fe5d:cdff/64 Scope:Link
> UP BROADCAST MULTICAST MTU:1500 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
> Interrupt:19
>
> lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> inet6 addr: ::1/128 Scope:Host
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:23141 errors:0 dropped:0 overruns:0 frame:0
> TX packets:23141 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:20145689 (19.2 Mb) TX bytes:20145689 (19.2 Mb)
>
> I hope someone can give me some guidance on how to debug this problem.
> Thanx in advance for any help
> that can be provided.
>
> Bernie Borenstein
> The Boeing Company
> <config.log.gz>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

"Half of what I say is meaningless; but I say it so that the other
half may reach you"
                                   Kahlil Gibran