Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Bugs in MPI_Abort() -- MPI_Finalize()?
From: Yves Caniou (yves.caniou_at_[hidden])
Date: 2010-06-02 05:49:26


Dear All,

As already said on this mailing list, I found that a simple Hello_world
program does not necessarily end (the program just hangs after the
MPI_Finalize(), and I can printf the MPI_FINALIZED which confirm that the MPI
part of the code has finished, but the exit() or return() never ends).

So I tried to use MPI_Abort(), and observed two different behaviors
(description of the architecture is given below).
Either it ends with a segfault, or the application doesn't return to shell,
even if the string "MPI_ABORT was [...] here)." appears on screen (program
just hangs, as with MPI_Finalize()).

This is annoying since I need several execution in a batch script, since
several submission cost a lot of time in queues. Then, if you have any tips
to bypass the hanging of the application, I take it (even if it means
recompile OpenMPI with specific options of course).

Thank you!

.Yves.

Here is an example of the output produced on screen. Note that errorcode is
the rank of the process which called MPI_Abort().

############################################
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec has exited due to process rank 0 with PID 18062 on
node ha8000-1 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
--------------------------------------------------------------------------
[ha8000-1:18060] *** Process received signal ***
[ha8000-1:18060] Signal: Segmentation fault (11)
[ha8000-1:18060] Signal code: Address not mapped (1)
[ha8000-1:18060] Failing at address: 0x2aaaac1bd940
Segmentation fault
############################################

The architecture is a Quad-Core AMD Opteron(tm) Processor 8356, Ethernet
controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC (10G-PCIE-8A), the
version of OMPI is 1.4.2 and have been compiled with GCC-4.5
$>ompi_info
                 Package: Open MPI p10015_at_ha8000-1 Distribution
                Open MPI: 1.4.2
   Open MPI SVN revision: r23093
   Open MPI release date: May 04, 2010
                Open RTE: 1.4.2
   Open RTE SVN revision: r23093
   Open RTE release date: May 04, 2010
                    OPAL: 1.4.2
       OPAL SVN revision: r23093
       OPAL release date: May 04, 2010
            Ident string: 1.4.2
                  Prefix: /home/p10015/openmpi
 Configured architecture: x86_64-unknown-linux-gnu
          Configure host: ha8000-1
           Configured by: p10015
           Configured on: Wed May 19 19:01:19 JST 2010
          Configure host: ha8000-1
                Built by: p10015
                Built on: Wed May 19 21:03:33 JST 2010
              Built host: ha8000-1
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
 Fortran90 bindings size: small
              C
compiler: /home/p10015/gcc/bin/x86_64-unknown-linux-gnu-gcc-4.5.0
     C compiler absolute:
            C++ compiler: /home/p10015/gcc/bin/x86_64-unknown-linux-gnu-g++
   C++ compiler absolute:
      Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/bin/gfortran
      Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/bin/gfortran
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: no
          Thread support: posix (mpi: yes, progress: yes)
           Sparse Groups: yes
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: yes
         MPI I/O support: yes
       MPI_WTIME support: gettimeofday
Symbol visibility support: yes
   FT Checkpoint support: no (checkpoint thread: no)
           MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4.2)
              MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4.2)
           MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.2)
               MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4.2)
               MCA carto: file (MCA v2.0, API v2.0, Component v1.4.2)
           MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.2)
           MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.2)
               MCA timer: linux (MCA v2.0, API v2.0, Component v1.4.2)
         MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4.2)
         MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4.2)
              MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4.2)
           MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4.2)
           MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4.2)
                MCA coll: basic (MCA v2.0, API v2.0, Component v1.4.2)
                MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.4.2)
                MCA coll: inter (MCA v2.0, API v2.0, Component v1.4.2)
                MCA coll: self (MCA v2.0, API v2.0, Component v1.4.2)
                MCA coll: sm (MCA v2.0, API v2.0, Component v1.4.2)
                MCA coll: sync (MCA v2.0, API v2.0, Component v1.4.2)
                MCA coll: tuned (MCA v2.0, API v2.0, Component v1.4.2)
                  MCA io: romio (MCA v2.0, API v2.0, Component v1.4.2)
               MCA mpool: fake (MCA v2.0, API v2.0, Component v1.4.2)
               MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4.2)
               MCA mpool: sm (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA pml: cm (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA pml: csum (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA pml: v (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA bml: r2 (MCA v2.0, API v2.0, Component v1.4.2)
              MCA rcache: vma (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA btl: self (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA btl: sm (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA btl: tcp (MCA v2.0, API v2.0, Component v1.4.2)
                MCA topo: unity (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA osc: rdma (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA iof: hnp (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA iof: orted (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA iof: tool (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA oob: tcp (MCA v2.0, API v2.0, Component v1.4.2)
                MCA odls: default (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA ras: slurm (MCA v2.0, API v2.0, Component v1.4.2)
               MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.4.2)
               MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.4.2)
               MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.4.2)
               MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA rml: oob (MCA v2.0, API v2.0, Component v1.4.2)
              MCA routed: binomial (MCA v2.0, API v2.0, Component v1.4.2)
              MCA routed: direct (MCA v2.0, API v2.0, Component v1.4.2)
              MCA routed: linear (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA plm: rsh (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA plm: slurm (MCA v2.0, API v2.0, Component v1.4.2)
               MCA filem: rsh (MCA v2.0, API v2.0, Component v1.4.2)
              MCA errmgr: default (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA ess: env (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA ess: hnp (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA ess: singleton (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA ess: slurm (MCA v2.0, API v2.0, Component v1.4.2)
                 MCA ess: tool (MCA v2.0, API v2.0, Component v1.4.2)
             MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.4.2)
             MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.4.2)

-- 
Yves Caniou
Associate Professor at Université Lyon 1,
Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
  * in Information Technology Center, The University of Tokyo,
    2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
    tel: +81-3-5841-0540
  * in National Institute of Informatics
    2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
    tel: +81-3-4212-2412 
http://graal.ens-lyon.fr/~ycaniou/