Weird - it works fine for me:
sjc-vpn5-109:mpi rhc$ mpirun -n 3 ./abort
Hello, World, I am 1 of 3
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 2.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 22980 on
node sjc-vpn5-109.cisco.com exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
Hello, World, I am 0 of 3
Hello, World, I am 2 of 3
I built it with gcc 4.2.1, though - I know we have a problem with shared memory hanging when built with gcc 4.4.x, so I wonder if the issue here is your use of gcc 4.5?
Can you try running this again with -mca btl ^sm?
Dear All,
As already said on this mailing list, I found that a simple Hello_world
program does not necessarily end (the program just hangs after the
MPI_Finalize(), and I can printf the MPI_FINALIZED which confirm that the MPI
part of the code has finished, but the exit() or return() never ends).
So I tried to use MPI_Abort(), and observed two different behaviors
(description of the architecture is given below).
Either it ends with a segfault, or the application doesn't return to shell,
even if the string "MPI_ABORT was [...] here)." appears on screen (program
just hangs, as with MPI_Finalize()).
This is annoying since I need several execution in a batch script, since
several submission cost a lot of time in queues. Then, if you have any tips
to bypass the hanging of the application, I take it (even if it means
recompile OpenMPI with specific options of course).
Thank you!
.Yves.
Here is an example of the output produced on screen. Note that errorcode is
the rank of the process which called MPI_Abort().
############################################
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 0.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec has exited due to process rank 0 with PID 18062 on
node ha8000-1 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
--------------------------------------------------------------------------
[ha8000-1:18060] *** Process received signal ***
[ha8000-1:18060] Signal: Segmentation fault (11)
[ha8000-1:18060] Signal code: Address not mapped (1)
[ha8000-1:18060] Failing at address: 0x2aaaac1bd940
Segmentation fault
############################################
The architecture is a Quad-Core AMD Opteron(tm) Processor 8356, Ethernet
controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC (10G-PCIE-8A), the
version of OMPI is 1.4.2 and have been compiled with GCC-4.5
$>ompi_info
Package: Open MPI p10015@ha8000-1 Distribution
Open MPI: 1.4.2
Open MPI SVN revision: r23093
Open MPI release date: May 04, 2010
Open RTE: 1.4.2
Open RTE SVN revision: r23093
Open RTE release date: May 04, 2010
OPAL: 1.4.2
OPAL SVN revision: r23093
OPAL release date: May 04, 2010
Ident string: 1.4.2
Prefix: /home/p10015/openmpi
Configured architecture: x86_64-unknown-linux-gnu
Configure host: ha8000-1
Configured by: p10015
Configured on: Wed May 19 19:01:19 JST 2010
Configure host: ha8000-1
Built by: p10015
Built on: Wed May 19 21:03:33 JST 2010
Built host: ha8000-1
C bindings: yes
C++ bindings: yes
Fortran77 bindings: yes (all)
Fortran90 bindings: yes
Fortran90 bindings size: small
C
compiler: /home/p10015/gcc/bin/x86_64-unknown-linux-gnu-gcc-4.5.0
C compiler absolute:
C++ compiler: /home/p10015/gcc/bin/x86_64-unknown-linux-gnu-g++
C++ compiler absolute:
Fortran77 compiler: gfortran
Fortran77 compiler abs: /usr/bin/gfortran
Fortran90 compiler: gfortran
Fortran90 compiler abs: /usr/bin/gfortran
C profiling: yes
C++ profiling: yes
Fortran77 profiling: yes
Fortran90 profiling: yes
C++ exceptions: no
Thread support: posix (mpi: yes, progress: yes)
Sparse Groups: yes
Internal debug support: no
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
libltdl support: yes
Heterogeneous support: yes
mpirun default --prefix: yes
MPI I/O support: yes
MPI_WTIME support: gettimeofday
Symbol visibility support: yes
FT Checkpoint support: no (checkpoint thread: no)
MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4.2)
MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4.2)
MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.2)
MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4.2)
MCA carto: file (MCA v2.0, API v2.0, Component v1.4.2)
MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.2)
MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.2)
MCA timer: linux (MCA v2.0, API v2.0, Component v1.4.2)
MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4.2)
MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4.2)
MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4.2)
MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4.2)
MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4.2)
MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4.2)
MCA coll: basic (MCA v2.0, API v2.0, Component v1.4.2)
MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.4.2)
MCA coll: inter (MCA v2.0, API v2.0, Component v1.4.2)
MCA coll: self (MCA v2.0, API v2.0, Component v1.4.2)
MCA coll: sm (MCA v2.0, API v2.0, Component v1.4.2)
MCA coll: sync (MCA v2.0, API v2.0, Component v1.4.2)
MCA coll: tuned (MCA v2.0, API v2.0, Component v1.4.2)
MCA io: romio (MCA v2.0, API v2.0, Component v1.4.2)
MCA mpool: fake (MCA v2.0, API v2.0, Component v1.4.2)
MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4.2)
MCA mpool: sm (MCA v2.0, API v2.0, Component v1.4.2)
MCA pml: cm (MCA v2.0, API v2.0, Component v1.4.2)
MCA pml: csum (MCA v2.0, API v2.0, Component v1.4.2)
MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.4.2)
MCA pml: v (MCA v2.0, API v2.0, Component v1.4.2)
MCA bml: r2 (MCA v2.0, API v2.0, Component v1.4.2)
MCA rcache: vma (MCA v2.0, API v2.0, Component v1.4.2)
MCA btl: self (MCA v2.0, API v2.0, Component v1.4.2)
MCA btl: sm (MCA v2.0, API v2.0, Component v1.4.2)
MCA btl: tcp (MCA v2.0, API v2.0, Component v1.4.2)
MCA topo: unity (MCA v2.0, API v2.0, Component v1.4.2)
MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.4.2)
MCA osc: rdma (MCA v2.0, API v2.0, Component v1.4.2)
MCA iof: hnp (MCA v2.0, API v2.0, Component v1.4.2)
MCA iof: orted (MCA v2.0, API v2.0, Component v1.4.2)
MCA iof: tool (MCA v2.0, API v2.0, Component v1.4.2)
MCA oob: tcp (MCA v2.0, API v2.0, Component v1.4.2)
MCA odls: default (MCA v2.0, API v2.0, Component v1.4.2)
MCA ras: slurm (MCA v2.0, API v2.0, Component v1.4.2)
MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.4.2)
MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.4.2)
MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.4.2)
MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.4.2)
MCA rml: oob (MCA v2.0, API v2.0, Component v1.4.2)
MCA routed: binomial (MCA v2.0, API v2.0, Component v1.4.2)
MCA routed: direct (MCA v2.0, API v2.0, Component v1.4.2)
MCA routed: linear (MCA v2.0, API v2.0, Component v1.4.2)
MCA plm: rsh (MCA v2.0, API v2.0, Component v1.4.2)
MCA plm: slurm (MCA v2.0, API v2.0, Component v1.4.2)
MCA filem: rsh (MCA v2.0, API v2.0, Component v1.4.2)
MCA errmgr: default (MCA v2.0, API v2.0, Component v1.4.2)
MCA ess: env (MCA v2.0, API v2.0, Component v1.4.2)
MCA ess: hnp (MCA v2.0, API v2.0, Component v1.4.2)
MCA ess: singleton (MCA v2.0, API v2.0, Component v1.4.2)
MCA ess: slurm (MCA v2.0, API v2.0, Component v1.4.2)
MCA ess: tool (MCA v2.0, API v2.0, Component v1.4.2)
MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.4.2)
MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.4.2)
--
Yves Caniou
Associate Professor at Université Lyon 1,
Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
* in Information Technology Center, The University of Tokyo,
2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
tel: +81-3-5841-0540
* in National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
tel: +81-3-4212-2412
http://graal.ens-lyon.fr/~ycaniou/
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users