Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Communitcation between OpenMPI and ClusterTools
From: Terry Dontje (Terry.Dontje_at_[hidden])
Date: 2008-07-29 12:09:47


>
> Date: Tue, 29 Jul 2008 09:03:40 -0400 From: "Alexander Shabarshin"
> <ashabarshin_at_[hidden]> Subject: Re: [OMPI users]
> Communitcation between OpenMPI and ClusterTools To:
> <users_at_[hidden]> Message-ID:
> <001e01c8f17b$867d2900$0349130a_at_Shabarshin> Content-Type: text/plain;
> format=flowed; charset="iso-8859-1"; reply-type=response Hello Yes,
> you are right - subnets are different, but routes set up correctly and
> everything like ping, ssh etc. are working OK between them
But it isn't a routing problem but how the tcp btl in Open MPI decides
which interface the nodes can communicate with (completely out of the
hands of the TCP stack and lower).
> Alexander Shabarshin P.S. Between Linuxes I even tried different
> versions of OpenMPI 1.2.4 and 1.2.5 - these versions work together
> correctly, but not with ClusterTools...
Are the linuxes boxes on the same subnet?

--td
> ----- Original Message ----- From: "Terry Dontje"
> <Terry.Dontje_at_[hidden]> To: <users_at_[hidden]> Sent: Tuesday, July
> 29, 2008 7:20 AM Subject: Re: [OMPI users] Communitcation between
> OpenMPI and ClusterTools
>> >I have not tested this type of setup so the following disclaimer needs to
>> >be said. These are not exactly the same release number. They are close but
>> >their code could have something in them that makes them incompatible.
>> > One idea comes to mind is whether the two nodes are on the same subnet?
>> > If they are not on the same subnet I think there is a bug in which the TCP
>> > BTL will recuse itself from communications between the two nodes.
>> >
>> > --td
>> >
>> >
>> >
>> > Date: Mon, 28 Jul 2008 16:58:57 -0400
>> > From: "Alexander Shabarshin" <ashabarshin_at_[hidden]>
>> > Subject: [OMPI users] Communitcation between OpenMPI and ClusterTools
>> > To: <users_at_[hidden]>
>> > Message-ID: <010001c8f0f4$c1ec8990$e7afcea7_at_Shabarshin>
>> > Content-Type: text/plain; format=flowed; charset="koi8-r";
>> > reply-type=original
>> >
>> > Hello
>> >
>> > I try to launch the same MPI sample code on Linux PC (Intel processors)
>> > servers with OpenMPI 1.2.5 and SunFire X2100 (AMD Opteron) servers with
>> > Solaris 10 and ClusterTools 7.1 (it looks like OpenMPI 1.2.5) using TCP
>> > through Ethernet. Linux PC with Linux PC work fine. SunFire with SunFire
>> > work fine. But when I launch the same task on Linux AND SunFire I get this
>> > error message:
>> >
>> > --------------------------------------------------------------------------
>> > Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
>> > If you specified the use of a BTL component, you may have
>> > forgotten a component (such as "self") in the list of
>> > usable components.
>> > --------------------------------------------------------------------------
>> > --------------------------------------------------------------------------
>> > Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
>> > If you specified the use of a BTL component, you may have
>> > forgotten a component (such as "self") in the list of
>> > usable components.
>> > --------------------------------------------------------------------------
>> > --------------------------------------------------------------------------
>> > It looks like MPI_INIT failed for some reason; your parallel process is
>> > likely to abort. There are many reasons that a parallel process can
>> > fail during MPI_INIT; some of which are due to configuration or
>> > environment
>> > problems. This failure appears to be an internal failure; here's some
>> > additional information (which may only be relevant to an Open MPI
>> > developer):
>> >
>> > PML add procs failed
>> > --> Returned "Unreachable" (-12) instead of "Success" (0)
>> > --------------------------------------------------------------------------
>> > *** An error occurred in MPI_Init
>> > *** before MPI was initialized
>> > *** MPI_ERRORS_ARE_FATAL (goodbye)
>> > mpirun noticed that job rank 1 with PID 25782 on node 10.0.0.2 exited on
>> > signal 15 (Terminated).
>> >
>> > it was launched by this command:
>> >
>> > mpirun --mca btl tcp,self --hostfile mpshosts -np 2 /mpi/sample
>> >
>> > /mpi/sample exists on both platforms compiled properly for each particular
>> > platform
>> >
>> > Linux machines have replicated path for SUN-like orted launch:
>> > /opt/SUNWhpc/HPC7.1/bin/orted
>> >
>> > Servers are pingable from each other. SSH works fine in both directions.
>> > But OpenMPI doesn't work on these servers... How can I make them
>> > understand each other? Thank you!
>> >
>> > Alexander Shabarshin
>> >
>> > P.S. This is output of ompi_info diagnostic for ClusterTools 7.1:
>> >
>> > Open MPI: 1.2.5r16572-ct7.1b003r3852
>> > Open MPI SVN revision: 0
>> > Open RTE: 1.2.5r16572-ct7.1b003r3852
>> > Open RTE SVN revision: 0
>> > OPAL: 1.2.5r16572-ct7.1b003r3852
>> > OPAL SVN revision: 0
>> > Prefix: /opt/SUNWhpc/HPC7.1
>> > Configured architecture: i386-pc-solaris2.10
>> > Configured by: root
>> > Configured on: Tue Oct 30 17:37:07 EDT 2007
>> > Configure host: burpen-csx10-0
>> > Built by:
>> > Built on: Tue Oct 30 17:52:10 EDT 2007
>> > Built host: burpen-csx10-0
>> > C bindings: yes
>> > C++ bindings: yes
>> > Fortran77 bindings: yes (all)
>> > Fortran90 bindings: yes
>> > Fortran90 bindings size: small
>> > C compiler: cc
>> > C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
>> > C++ compiler: CC
>> > C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
>> > Fortran77 compiler: f77
>> > Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
>> > Fortran90 compiler: f95
>> > Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
>> > C profiling: yes
>> > C++ profiling: yes
>> > Fortran77 profiling: yes
>> > Fortran90 profiling: yes
>> > C++ exceptions: yes
>> > Thread support: no
>> > Internal debug support: no
>> > MPI parameter check: runtime
>> > Memory profiling support: no
>> > Memory debugging support: no
>> > libltdl support: yes
>> > Heterogeneous support: yes
>> > mpirun default --prefix: yes
>> > MCA backtrace: printstack (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>> > MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>> > MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA coll: self (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA io: romio (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.5)
>> > MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.5)
>> > MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
>> > MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.5)
>> > MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.5)
>> > MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>> > MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA rmaps: round_robin (MCA v1.0, API v1.3, Component
>> > v1.2.5)
>> > MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.5)
>> > MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.5)
>> > MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA sds: env (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.5)
>> >
>> > and output of ompi_info diagnostic for OpenMPI 1.2.5 compiled on Linux:
>> >
>> > Open MPI: 1.2.5
>> > Open MPI SVN revision: r16989
>> > Open RTE: 1.2.5
>> > Open RTE SVN revision: r16989
>> > OPAL: 1.2.5
>> > OPAL SVN revision: r16989
>> > Prefix: /usr/local
>> > Configured architecture: i686-pc-linux-gnu
>> > Configured by: shaos
>> > Configured on: Thu Jul 24 12:07:38 EDT 2008
>> > Configure host: remote-linux
>> > Built by: shaos
>> > Built on: Thu Jul 24 12:23:40 EDT 2008
>> > Built host: remote-linux
>> > C bindings: yes
>> > C++ bindings: yes
>> > Fortran77 bindings: yes (all)
>> > Fortran90 bindings: no
>> > Fortran90 bindings size: na
>> > C compiler: gcc
>> > C compiler absolute: /usr/bin/gcc
>> > C++ compiler: g++
>> > C++ compiler absolute: /usr/bin/g++
>> > Fortran77 compiler: g77
>> > Fortran77 compiler abs: /usr/bin/g77
>> > Fortran90 compiler: none
>> > Fortran90 compiler abs: none
>> > C profiling: yes
>> > C++ profiling: yes
>> > Fortran77 profiling: yes
>> > Fortran90 profiling: no
>> > C++ exceptions: no
>> > Thread support: posix (mpi: no, progress: no)
>> > Internal debug support: no
>> > MPI parameter check: runtime
>> > Memory profiling support: no
>> > Memory debugging support: no
>> > libltdl support: yes
>> > Heterogeneous support: yes
>> > mpirun default --prefix: no
>> > MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>> > MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>> > MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA coll: self (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA io: romio (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.5)
>> > MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.5)
>> > MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
>> > MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.5)
>> > MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.5)
>> > MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>> > MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA rmaps: round_robin (MCA v1.0, API v1.3, Component
>> > v1.2.5)
>> > MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.5)
>> > MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.5)
>> > MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA pls: slurm (MCA v1.0, API v1.3, Component v1.2.5)
>> > MCA sds: env (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.5)
>> > MCA sds: slurm (MCA v1.0, API v1.0, Component v1.2.5)
>> >
>> >
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>