Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Communitcation between OpenMPI and ClusterTools
From: Alexander Shabarshin (ashabarshin_at_[hidden])
Date: 2008-07-29 09:03:40


Hello

Yes, you are right - subnets are different, but routes set up correctly and
everything like ping, ssh etc. are working OK between them

Alexander Shabarshin

P.S. Between Linuxes I even tried different versions of OpenMPI 1.2.4 and
1.2.5 - these versions work together correctly, but not with ClusterTools...

----- Original Message -----
From: "Terry Dontje" <Terry.Dontje_at_[hidden]>
To: <users_at_[hidden]>
Sent: Tuesday, July 29, 2008 7:20 AM
Subject: Re: [OMPI users] Communitcation between OpenMPI and ClusterTools

>I have not tested this type of setup so the following disclaimer needs to
>be said. These are not exactly the same release number. They are close but
>their code could have something in them that makes them incompatible.
> One idea comes to mind is whether the two nodes are on the same subnet?
> If they are not on the same subnet I think there is a bug in which the TCP
> BTL will recuse itself from communications between the two nodes.
>
> --td
>
>
>
> Date: Mon, 28 Jul 2008 16:58:57 -0400
> From: "Alexander Shabarshin" <ashabarshin_at_[hidden]>
> Subject: [OMPI users] Communitcation between OpenMPI and ClusterTools
> To: <users_at_[hidden]>
> Message-ID: <010001c8f0f4$c1ec8990$e7afcea7_at_Shabarshin>
> Content-Type: text/plain; format=flowed; charset="koi8-r";
> reply-type=original
>
> Hello
>
> I try to launch the same MPI sample code on Linux PC (Intel processors)
> servers with OpenMPI 1.2.5 and SunFire X2100 (AMD Opteron) servers with
> Solaris 10 and ClusterTools 7.1 (it looks like OpenMPI 1.2.5) using TCP
> through Ethernet. Linux PC with Linux PC work fine. SunFire with SunFire
> work fine. But when I launch the same task on Linux AND SunFire I get this
> error message:
>
> --------------------------------------------------------------------------
> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or
> environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> mpirun noticed that job rank 1 with PID 25782 on node 10.0.0.2 exited on
> signal 15 (Terminated).
>
> it was launched by this command:
>
> mpirun --mca btl tcp,self --hostfile mpshosts -np 2 /mpi/sample
>
> /mpi/sample exists on both platforms compiled properly for each particular
> platform
>
> Linux machines have replicated path for SUN-like orted launch:
> /opt/SUNWhpc/HPC7.1/bin/orted
>
> Servers are pingable from each other. SSH works fine in both directions.
> But OpenMPI doesn't work on these servers... How can I make them
> understand each other? Thank you!
>
> Alexander Shabarshin
>
> P.S. This is output of ompi_info diagnostic for ClusterTools 7.1:
>
> Open MPI: 1.2.5r16572-ct7.1b003r3852
> Open MPI SVN revision: 0
> Open RTE: 1.2.5r16572-ct7.1b003r3852
> Open RTE SVN revision: 0
> OPAL: 1.2.5r16572-ct7.1b003r3852
> OPAL SVN revision: 0
> Prefix: /opt/SUNWhpc/HPC7.1
> Configured architecture: i386-pc-solaris2.10
> Configured by: root
> Configured on: Tue Oct 30 17:37:07 EDT 2007
> Configure host: burpen-csx10-0
> Built by:
> Built on: Tue Oct 30 17:52:10 EDT 2007
> Built host: burpen-csx10-0
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (all)
> Fortran90 bindings: yes
> Fortran90 bindings size: small
> C compiler: cc
> C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
> C++ compiler: CC
> C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
> Fortran77 compiler: f77
> Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
> Fortran90 compiler: f95
> Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: yes
> Thread support: no
> Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> Heterogeneous support: yes
> mpirun default --prefix: yes
> MCA backtrace: printstack (MCA v1.0, API v1.0, Component v1.2.5)
> MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.5)
> MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.5)
> MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.5)
> MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.5)
> MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.5)
> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.5)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.2.5)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.5)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.5)
> MCA io: romio (MCA v1.0, API v1.0, Component v1.2.5)
> MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.5)
> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.5)
> MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.5)
> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.5)
> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.5)
> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.5)
> MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.5)
> MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.5)
> MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
> MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.5)
> MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.5)
> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.5)
> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.5)
> MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.5)
> MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.5)
> MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.5)
> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.5)
> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.5)
> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.5)
> MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.5)
> MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.5)
> MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.5)
> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.5)
> MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.5)
> MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.5)
> MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.5)
> MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.5)
> MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.5)
> MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.5)
> MCA rmaps: round_robin (MCA v1.0, API v1.3, Component
> v1.2.5)
> MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.5)
> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.5)
> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.5)
> MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.5)
> MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.5)
> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.5)
> MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.5)
> MCA sds: env (MCA v1.0, API v1.0, Component v1.2.5)
> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.5)
> MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.5)
> MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.5)
>
> and output of ompi_info diagnostic for OpenMPI 1.2.5 compiled on Linux:
>
> Open MPI: 1.2.5
> Open MPI SVN revision: r16989
> Open RTE: 1.2.5
> Open RTE SVN revision: r16989
> OPAL: 1.2.5
> OPAL SVN revision: r16989
> Prefix: /usr/local
> Configured architecture: i686-pc-linux-gnu
> Configured by: shaos
> Configured on: Thu Jul 24 12:07:38 EDT 2008
> Configure host: remote-linux
> Built by: shaos
> Built on: Thu Jul 24 12:23:40 EDT 2008
> Built host: remote-linux
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (all)
> Fortran90 bindings: no
> Fortran90 bindings size: na
> C compiler: gcc
> C compiler absolute: /usr/bin/gcc
> C++ compiler: g++
> C++ compiler absolute: /usr/bin/g++
> Fortran77 compiler: g77
> Fortran77 compiler abs: /usr/bin/g77
> Fortran90 compiler: none
> Fortran90 compiler abs: none
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: no
> C++ exceptions: no
> Thread support: posix (mpi: no, progress: no)
> Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> Heterogeneous support: yes
> mpirun default --prefix: no
> MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.5)
> MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.5)
> MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.5)
> MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.5)
> MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.5)
> MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.5)
> MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.5)
> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.5)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.2.5)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.5)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.5)
> MCA io: romio (MCA v1.0, API v1.0, Component v1.2.5)
> MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.5)
> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.5)
> MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.5)
> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.5)
> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.5)
> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.5)
> MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.5)
> MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.5)
> MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
> MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.5)
> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.5)
> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.5)
> MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.5)
> MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.5)
> MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.5)
> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.5)
> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.5)
> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.5)
> MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.5)
> MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.5)
> MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.5)
> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.5)
> MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.5)
> MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.5)
> MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.5)
> MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.5)
> MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.5)
> MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.5)
> MCA rmaps: round_robin (MCA v1.0, API v1.3, Component
> v1.2.5)
> MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.5)
> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.5)
> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.5)
> MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.5)
> MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.5)
> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.5)
> MCA pls: slurm (MCA v1.0, API v1.3, Component v1.2.5)
> MCA sds: env (MCA v1.0, API v1.0, Component v1.2.5)
> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.5)
> MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.5)
> MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.5)
> MCA sds: slurm (MCA v1.0, API v1.0, Component v1.2.5)
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users