Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] program stalls in __write_nocancel()
From: Peter Beerli (beerli_at_[hidden])
Date: 2008-11-05 22:12:24


On some of my larger problems ,
my program stalls and does not continue

(50 or more nodes, 'long' runs >5 hours). My program is set up as a
master-worker
and it seems that the master gets stuck in a write to stdout see gdb
backtrace below (It took all day
to get there on 50 nodes). the function handle_message is simply
printing to the stdout in this case.
Of course the workers keep sending stuff to the master, but the master
is stuck
writing that does not finish. Any idea where to look next?
[smaller runs look fine, valgrind did not find problems in my code
(complaining a lot about openmpi so)
I attach also the ompi_info to show versions (OS is macos 10.5.5)
any idea what is going on? [any hint is welcome!]

thanks
Peter

(gdb) bt
#0 0x00000037528c0e50 in __write_nocancel () from /lib64/libc.so.6
#1 0x00000037528694b3 in _IO_new_file_write () from /lib64/libc.so.6
#2 0x00000037528693c6 in _IO_new_do_write () from /lib64/libc.so.6
#3 0x000000375286a822 in _IO_new_file_xsputn () from /lib64/libc.so.6
#4 0x000000375285f4f8 in fputs () from /lib64/libc.so.6
#5 0x000000000045e9de in handle_message (
    rawmessage=0x4bb8830 "M0:[ 12] Swapping between 4 temperatures.
\n", ' ' <repeats 11 times>, "Temperature | Accepted | Swaps between
temperatures\n", ' ' <repeats 16 times>, "1e+06 | 0.00 |
|\n", ' ' <repeats 15 times>, "3.0000 | 0.08 | 1 ||"...,
sender=12, world=0x448d8b0)
    at migrate_mpi.c:3663
#6 0x000000000045362a in mpi_runloci_master (loci=1, who=0x4541fc0,
    world=0x448d8b0, options_readsum=0, menu=0) at migrate_mpi.c:228
#7 0x000000000044ed86 in run_sampler (options=0x448dc20,
data=0x4465a10,
    universe=0x42b90c0, usize=4, outfilepos=0x7fff0ff98ee0,
    Gmax=0x7fff0ff98ee8) at main.c:885
#8 0x000000000044dff2 in main (argc=3, argv=0x7fff0ff99008) at main.c:
422

petal:~>ompi_info
                Open MPI: 1.2.8
   Open MPI SVN revision: r19718
                Open RTE: 1.2.8
   Open RTE SVN revision: r19718
                    OPAL: 1.2.8
       OPAL SVN revision: r19718
                  Prefix: /home/beerli/openmpi
Configured architecture: x86_64-unknown-linux-gnu
           Configured by: beerli
           Configured on: Mon Nov 3 15:00:02 EST 2008
          Configure host: petal
                Built by: beerli
                Built on: Mon Nov 3 15:08:02 EST 2008
              Built host: petal
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
Fortran90 bindings size: small
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
      Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/bin/gfortran
      Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/bin/gfortran
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: no
          Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
   Heterogeneous support: yes
mpirun default --prefix: no
           MCA backtrace: execinfo (MCA v1.0, API v1.0, Component
v1.2.8)
              MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
v1.2.8)
           MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.8)
           MCA maffinity: first_use (MCA v1.0, API v1.0, Component
v1.2.8)
               MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.8)
         MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.8)
         MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.8)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.8)
                MCA coll: self (MCA v1.0, API v1.0, Component v1.2.8)
                MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.8)
                MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.8)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.2.8)
               MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.8)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.8)
              MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.8)
                 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.8)
                 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.8)
              MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.8)
              MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.8)
              MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.8)
                  MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.8)
                  MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.8)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: dash_host (MCA v1.0, API v1.3, Component
v1.2.8)
                 MCA ras: gridengine (MCA v1.0, API v1.3, Component
v1.2.8)
                 MCA ras: localhost (MCA v1.0, API v1.3, Component
v1.2.8)
                 MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA rds: hostfile (MCA v1.0, API v1.3, Component
v1.2.8)
                 MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.8)
               MCA rmaps: round_robin (MCA v1.0, API v1.3, Component
v1.2.8)
                MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.8)
                MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.8)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA pls: gridengine (MCA v1.0, API v1.3, Component
v1.2.8)
                 MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA pls: slurm (MCA v1.0, API v1.3, Component v1.2.8)
                 MCA sds: env (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.8)
                 MCA sds: singleton (MCA v1.0, API v1.0, Component
v1.2.8)
                 MCA sds: slurm (MCA v1.0, API v1.0, Component v1.2.8)