Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-07-28 12:37:45


Trolling through some really old mails that never got replies... :-(

I'm afraid that the guy who did the GM code in Open MPI is currently on
vacation, but we have made a small number of changes since 1.1 that may have
fixed your issue.

Could you try one of the 1.1.1 release candidate tarballs and see if you
still have the problem?

    http://www.open-mpi.org/software/ompi/v1.1/

On 7/3/06 12:58 PM, "Borenstein, Bernard S"
<bernard.s.borenstein_at_[hidden]> wrote:

> I've built and sucessfully run the Nasa Overflow 2.0aa program with
> Openmpi 1.0.2. I'm running on an opteron linux cluster running SLES 9
> and GM 2.0.24. I built Openmpi 1.1 with the intel 9 compilers and try to
> run Overflow 2.0aa with myrinet, it get what looks like a data
> corruption error and the program dies quickly.
> There are no mpi errors at all.If I run using GIGE (--mca btl self,tcp),
> the program runs to competion correctly. Here is my ompi_info output :
>
> bsb3227_at_mahler:~/openmpi_1.1/bin> ./ompi_info
> Open MPI: 1.1
> Open MPI SVN revision: r10477
> Open RTE: 1.1
> Open RTE SVN revision: r10477
> OPAL: 1.1
> OPAL SVN revision: r10477
> Prefix: /home/bsb3227/openmpi_1.1
> Configured architecture: x86_64-unknown-linux-gnu
> Configured by: bsb3227
> Configured on: Fri Jun 30 07:08:54 PDT 2006
> Configure host: mahler
> Built by: bsb3227
> Built on: Fri Jun 30 07:54:46 PDT 2006
> Built host: mahler
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (all)
> Fortran90 bindings: yes
> Fortran90 bindings size: small
> C compiler: icc
> C compiler absolute: /opt/intel/cce/9.0.25/bin/icc
> C++ compiler: icpc
> C++ compiler absolute: /opt/intel/cce/9.0.25/bin/icpc
> Fortran77 compiler: ifort
> Fortran77 compiler abs: /opt/intel/fce/9.0.25/bin/ifort
> Fortran90 compiler: /opt/intel/fce/9.0.25/bin/ifort
> Fortran90 compiler abs: /opt/intel/fce/9.0.25/bin/ifort
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: no
> Thread support: posix (mpi: no, progress: no)
> Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1)
> MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1)
> MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1)
> MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.1)
> MCA timer: linux (MCA v1.0, API v1.0, Component v1.1)
> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Component v1.1)
> MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.1)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.1)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1)
> MCA io: romio (MCA v1.0, API v1.0, Component v1.1)
> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1)
> MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1)
> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1)
> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1)
> MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1)
> MCA btl: self (MCA v1.0, API v1.0, Component v1.1)
> MCA btl: sm (MCA v1.0, API v1.0, Component v1.1)
> MCA btl: gm (MCA v1.0, API v1.0, Component v1.1)
> MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA topo: unity (MCA v1.0, API v1.0, Component v1.1)
> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
> MCA gpr: null (MCA v1.0, API v1.0, Component v1.1)
> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1)
> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1)
> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1)
> MCA iof: svc (MCA v1.0, API v1.0, Component v1.1)
> MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1)
> MCA ns: replica (MCA v1.0, API v1.0, Component v1.1)
> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1)
> MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1)
> MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1)
> MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1)
> MCA ras: tm (MCA v1.0, API v1.0, Component v1.1)
> MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1)
> MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1)
> MCA rmaps: round_robin (MCA v1.0, API v1.0, Component
> v1.1)
> MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1)
> MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1)
> MCA rml: oob (MCA v1.0, API v1.0, Component v1.1)
> MCA pls: fork (MCA v1.0, API v1.0, Component v1.1)
> MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1)
> MCA pls: slurm (MCA v1.0, API v1.0, Component v1.1)
> MCA pls: tm (MCA v1.0, API v1.0, Component v1.1)
> MCA sds: env (MCA v1.0, API v1.0, Component v1.1)
> MCA sds: seed (MCA v1.0, API v1.0, Component v1.1)
> MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1)
> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1)
> MCA sds: slurm (MCA v1.0, API v1.0, Component v1.1)
>
> Here is the ifconfig for one of the nodes :
>
> bsb3227_at_m045:~> /sbin/ifconfig
> eth0 Link encap:Ethernet HWaddr 00:50:45:5D:CD:FE
> inet addr:10.241.194.45 Bcast:10.241.195.255
> Mask:255.255.254.0
> inet6 addr: fe80::250:45ff:fe5d:cdfe/64 Scope:Link
> UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:39913407 errors:0 dropped:0 overruns:0 frame:0
> TX packets:48794587 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:31847343907 (30371.9 Mb) TX bytes:48231713866
> (45997.3 Mb)
> Interrupt:19
>
> eth1 Link encap:Ethernet HWaddr 00:50:45:5D:CD:FF
> inet6 addr: fe80::250:45ff:fe5d:cdff/64 Scope:Link
> UP BROADCAST MULTICAST MTU:1500 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
> Interrupt:19
>
> lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> inet6 addr: ::1/128 Scope:Host
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:23141 errors:0 dropped:0 overruns:0 frame:0
> TX packets:23141 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:20145689 (19.2 Mb) TX bytes:20145689 (19.2 Mb)
>
> I hope someone can give me some guidance on how to debug this problem.
> Thanx in advance for any help
> that can be provided.
>
> Bernie Borenstein
> The Boeing Company
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems