Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bugs in MPI_Abort() -- MPI_Finalize()?
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-06-02 08:03:05


Weird - it works fine for me:

sjc-vpn5-109:mpi rhc$ mpirun -n 3 ./abort
Hello, World, I am 1 of 3
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 22980 on
node sjc-vpn5-109.cisco.com exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
Hello, World, I am 0 of 3
Hello, World, I am 2 of 3

I built it with gcc 4.2.1, though - I know we have a problem with shared
memory hanging when built with gcc 4.4.x, so I wonder if the issue here is
your use of gcc 4.5?

Can you try running this again with -mca btl ^sm?

On Wed, Jun 2, 2010 at 3:49 AM, Yves Caniou <yves.caniou_at_[hidden]> wrote:

> Dear All,
>
> As already said on this mailing list, I found that a simple Hello_world
> program does not necessarily end (the program just hangs after the
> MPI_Finalize(), and I can printf the MPI_FINALIZED which confirm that the
> MPI
> part of the code has finished, but the exit() or return() never ends).
>
> So I tried to use MPI_Abort(), and observed two different behaviors
> (description of the architecture is given below).
> Either it ends with a segfault, or the application doesn't return to shell,
> even if the string "MPI_ABORT was [...] here)." appears on screen (program
> just hangs, as with MPI_Finalize()).
>
> This is annoying since I need several execution in a batch script, since
> several submission cost a lot of time in queues. Then, if you have any tips
> to bypass the hanging of the application, I take it (even if it means
> recompile OpenMPI with specific options of course).
>
> Thank you!
>
> .Yves.
>
> Here is an example of the output produced on screen. Note that errorcode is
> the rank of the process which called MPI_Abort().
>
> ############################################
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 0.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpiexec has exited due to process rank 0 with PID 18062 on
> node ha8000-1 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpiexec (as reported here).
> --------------------------------------------------------------------------
> [ha8000-1:18060] *** Process received signal ***
> [ha8000-1:18060] Signal: Segmentation fault (11)
> [ha8000-1:18060] Signal code: Address not mapped (1)
> [ha8000-1:18060] Failing at address: 0x2aaaac1bd940
> Segmentation fault
> ############################################
>
> The architecture is a Quad-Core AMD Opteron(tm) Processor 8356, Ethernet
> controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC (10G-PCIE-8A), the
> version of OMPI is 1.4.2 and have been compiled with GCC-4.5
> $>ompi_info
> Package: Open MPI p10015_at_ha8000-1 Distribution
> Open MPI: 1.4.2
> Open MPI SVN revision: r23093
> Open MPI release date: May 04, 2010
> Open RTE: 1.4.2
> Open RTE SVN revision: r23093
> Open RTE release date: May 04, 2010
> OPAL: 1.4.2
> OPAL SVN revision: r23093
> OPAL release date: May 04, 2010
> Ident string: 1.4.2
> Prefix: /home/p10015/openmpi
> Configured architecture: x86_64-unknown-linux-gnu
> Configure host: ha8000-1
> Configured by: p10015
> Configured on: Wed May 19 19:01:19 JST 2010
> Configure host: ha8000-1
> Built by: p10015
> Built on: Wed May 19 21:03:33 JST 2010
> Built host: ha8000-1
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (all)
> Fortran90 bindings: yes
> Fortran90 bindings size: small
> C
> compiler: /home/p10015/gcc/bin/x86_64-unknown-linux-gnu-gcc-4.5.0
> C compiler absolute:
> C++ compiler: /home/p10015/gcc/bin/x86_64-unknown-linux-gnu-g++
> C++ compiler absolute:
> Fortran77 compiler: gfortran
> Fortran77 compiler abs: /usr/bin/gfortran
> Fortran90 compiler: gfortran
> Fortran90 compiler abs: /usr/bin/gfortran
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: no
> Thread support: posix (mpi: yes, progress: yes)
> Sparse Groups: yes
> Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> Heterogeneous support: yes
> mpirun default --prefix: yes
> MPI I/O support: yes
> MPI_WTIME support: gettimeofday
> Symbol visibility support: yes
> FT Checkpoint support: no (checkpoint thread: no)
> MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4.2)
> MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4.2)
> MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.2)
> MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4.2)
> MCA carto: file (MCA v2.0, API v2.0, Component v1.4.2)
> MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.2)
> MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.2)
> MCA timer: linux (MCA v2.0, API v2.0, Component v1.4.2)
> MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4.2)
> MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4.2)
> MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4.2)
> MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4.2)
> MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4.2)
> MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4.2)
> MCA coll: basic (MCA v2.0, API v2.0, Component v1.4.2)
> MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.4.2)
> MCA coll: inter (MCA v2.0, API v2.0, Component v1.4.2)
> MCA coll: self (MCA v2.0, API v2.0, Component v1.4.2)
> MCA coll: sm (MCA v2.0, API v2.0, Component v1.4.2)
> MCA coll: sync (MCA v2.0, API v2.0, Component v1.4.2)
> MCA coll: tuned (MCA v2.0, API v2.0, Component v1.4.2)
> MCA io: romio (MCA v2.0, API v2.0, Component v1.4.2)
> MCA mpool: fake (MCA v2.0, API v2.0, Component v1.4.2)
> MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4.2)
> MCA mpool: sm (MCA v2.0, API v2.0, Component v1.4.2)
> MCA pml: cm (MCA v2.0, API v2.0, Component v1.4.2)
> MCA pml: csum (MCA v2.0, API v2.0, Component v1.4.2)
> MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.4.2)
> MCA pml: v (MCA v2.0, API v2.0, Component v1.4.2)
> MCA bml: r2 (MCA v2.0, API v2.0, Component v1.4.2)
> MCA rcache: vma (MCA v2.0, API v2.0, Component v1.4.2)
> MCA btl: self (MCA v2.0, API v2.0, Component v1.4.2)
> MCA btl: sm (MCA v2.0, API v2.0, Component v1.4.2)
> MCA btl: tcp (MCA v2.0, API v2.0, Component v1.4.2)
> MCA topo: unity (MCA v2.0, API v2.0, Component v1.4.2)
> MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.4.2)
> MCA osc: rdma (MCA v2.0, API v2.0, Component v1.4.2)
> MCA iof: hnp (MCA v2.0, API v2.0, Component v1.4.2)
> MCA iof: orted (MCA v2.0, API v2.0, Component v1.4.2)
> MCA iof: tool (MCA v2.0, API v2.0, Component v1.4.2)
> MCA oob: tcp (MCA v2.0, API v2.0, Component v1.4.2)
> MCA odls: default (MCA v2.0, API v2.0, Component v1.4.2)
> MCA ras: slurm (MCA v2.0, API v2.0, Component v1.4.2)
> MCA rmaps: load_balance (MCA v2.0, API v2.0, Component
> v1.4.2)
> MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.4.2)
> MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.4.2)
> MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.4.2)
> MCA rml: oob (MCA v2.0, API v2.0, Component v1.4.2)
> MCA routed: binomial (MCA v2.0, API v2.0, Component v1.4.2)
> MCA routed: direct (MCA v2.0, API v2.0, Component v1.4.2)
> MCA routed: linear (MCA v2.0, API v2.0, Component v1.4.2)
> MCA plm: rsh (MCA v2.0, API v2.0, Component v1.4.2)
> MCA plm: slurm (MCA v2.0, API v2.0, Component v1.4.2)
> MCA filem: rsh (MCA v2.0, API v2.0, Component v1.4.2)
> MCA errmgr: default (MCA v2.0, API v2.0, Component v1.4.2)
> MCA ess: env (MCA v2.0, API v2.0, Component v1.4.2)
> MCA ess: hnp (MCA v2.0, API v2.0, Component v1.4.2)
> MCA ess: singleton (MCA v2.0, API v2.0, Component v1.4.2)
> MCA ess: slurm (MCA v2.0, API v2.0, Component v1.4.2)
> MCA ess: tool (MCA v2.0, API v2.0, Component v1.4.2)
> MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.4.2)
> MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.4.2)
>
> --
> Yves Caniou
> Associate Professor at Université Lyon 1,
> Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> * in Information Technology Center, The University of Tokyo,
> 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> tel: +81-3-5841-0540
> * in National Institute of Informatics
> 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> tel: +81-3-4212-2412
> http://graal.ens-lyon.fr/~ycaniou/ <http://graal.ens-lyon.fr/%7Eycaniou/>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users