Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun hangs
From: Maciej Kazulak (kazulakm_at_[hidden])
Date: 2009-01-05 17:01:44


2009/1/3 Maciej Kazulak <kazulakm_at_[hidden]>

> Hi,
>
> I have a weird problem. After a fresh install mpirun refuses to work:
>
> box% ./hello
> Process 0 on box out of 1
> box% mpirun -np 1 ./hello
> # hangs here, no output, nothing at all; on another terminal:
> box% ps axl | egrep 'mpirun|orted'
> 0 1000 24162 7687 20 0 86704 2744 - Sl+ pts/2 0:00 mpirun
> -np 1 ./hello
> 1 1000 24165 1 20 0 76016 2088 - Ss ? 0:00 orted
> --bootproxy 1 --name 0.0.1 --num_procs 2 --vpid_start 0 --nodename box
> --universe ncl_at_box:default-universe-24162 --nsreplica "0.0.0;tcp://
> 192.168.1.8:21500" --gprreplica "0.0.0;tcp://192.168.1.8:21500" --set-sid
> 0 1000 24171 23924 20 0 6020 732 pipe_w S+ pts/3 0:00 egrep
> mpirun|orted
>
> Is there some post-install configuration i forgot to do? I couldn't find
> anything useful in the faq nor the docs that come with the package.
> Following advice in this thread
> http://www.open-mpi.org/community/lists/users/2007/08/3845.php i tried
> --debug-daemons but no output whatsoever as above.
> Also tried MTT:
>
> box% cat developer.ini trivial.ini| ../client/mtt -
> alreadyinstalled_dir=/usr
> ompi:version:full:1.2.8
> *** WARNING: Test: cxx_hello, np=2, variant=1: TIMED OUT (failed)
> *** WARNING: Test: c_ring, np=2, variant=1: TIMED OUT (failed)
> *** WARNING: Test: cxx_ring, np=2, variant=1: TIMED OUT (failed)
> *** WARNING: Test: c_hello, np=2, variant=1: TIMED OUT (failed)
> *** WARNING: Test: c_ring, np=2, variant=1: TIMED OUT (failed)
> *** WARNING: Test: c_hello, np=2, variant=1: TIMED OUT (failed)
>
> MTT Results Summary
> hostname: box
> uname: Linux box 2.6.28-gentoo #2 SMP Thu Jan 1 15:27:59 CET 2009 x86_64
> Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz GenuineIntel GNU/Linux
> who am i:
> +-------------+-----------------+------+------+----------+------+
> | Phase | Section | Pass | Fail | Time out | Skip |
> +-------------+-----------------+------+------+----------+------+
> | MPI install | my installation | 1 | 0 | 0 | 0 |
> | MPI install | my installation | 1 | 0 | 0 | 0 |
> | Test Build | trivial | 1 | 0 | 0 | 0 |
> | Test Build | trivial | 1 | 0 | 0 | 0 |
> | Test Run | trivial | 0 | 0 | 4 | 0 |
> | Test Run | trivial | 0 | 0 | 2 | 0 |
> +-------------+-----------------+------+------+----------+------+
>
> box% ompi_info
> Open MPI: 1.2.8
> Open MPI SVN revision: r19718
> Open RTE: 1.2.8
> Open RTE SVN revision: r19718
> OPAL: 1.2.8
> OPAL SVN revision: r19718
> Prefix: /usr
> Configured architecture: x86_64-pc-linux-gnu
> Configured by: root
> Configured on: Sat Jan 3 01:03:53 CET 2009
> Configure host: box
> Built by: root
> Built on: sob, 3 sty 2009, 01:06:54 CET
> Built host: box
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: no
> Fortran90 bindings: no
> Fortran90 bindings size: na
> C compiler: x86_64-pc-linux-gnu-gcc
> C compiler absolute: /usr/bin/x86_64-pc-linux-gnu-gcc
> C++ compiler: x86_64-pc-linux-gnu-g++
> C++ compiler absolute: /usr/bin/x86_64-pc-linux-gnu-g++
> Fortran77 compiler: x86_64-pc-linux-gnu-gfortran
> Fortran77 compiler abs: /usr/bin/x86_64-pc-linux-gnu-gfortran
> Fortran90 compiler: none
> Fortran90 compiler abs: none
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: no
> Fortran90 profiling: no
> C++ exceptions: no
> Thread support: posix (mpi: no, progress: no)
> Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> Heterogeneous support: yes
> mpirun default --prefix: yes
> MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.8)
> MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.8)
> MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.8)
> MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.8)
> MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.8)
> MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.8)
> MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.8)
> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.8)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.2.8)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.8)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.8)
> MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.8)
> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.8)
> MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.8)
> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.8)
> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.8)
> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.8)
> MCA btl: self (MCA v1.0, API v1.0.1, Compone
> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.8)
> MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.8)
> MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.8)
> MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.8)
> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.8)
> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.8)
> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.8)
> MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.8)
> MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.8)
> MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.8)
> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.8)
> MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.8)
> MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.8)
> MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.8)
> MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.8)
> MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.8)
> MCA rmaps: round_robin (MCA v1.0, API v1.3, Component
> v1.2.8)
> MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.8)
> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.8)
> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.8)
> MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.8)
> MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.8)
> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.8)
> MCA sds: env (MCA v1.0, API v1.0, Component v1.2.8)
> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.8)
> MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.8)
> MCA sds: singleton (MCA v1.0, API v1.0, Component
> v1.2.8)nt v1.2.8)
> MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.8)
> MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
> MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.8)
> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.8)
>
> I tried 1.2.6-r1 earlier with same results which only leads me to the
> assumption I must be doing something wrong but I'm out of ideas for now.
> Anyone?
>

Nevermind.

Interesting though. I thought in such a simple scenario shared memory would
be used for IPC (or whatever's fastest) . But nope. Even with one process
still it wants to use TCP/IP to communicate between mpirun and orted. What's
even more surprising to me it won't use loopback for that. Hence my maybe a
little bit over-restrictive iptables rules were the problem. I allowed
traffic from 127.0.0.1 to 127.0.0.1 on lo but not from <eth0_addr> to
<eth0_addr> on eth0 and both processes ended up waiting for IO.

Can I somehow configure it to use something other than TCP/IP here? Or at
least switch it to loopback?