Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Error Running Executable Linking C++, C, F77 and F90
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-02-29 15:29:11


FWIW, this kind of error usually occurs when an invalid communicator
value is passed to an MPI function. OMPI just reports
"MPI_COMM_WORLD" because it has no other communicator to report.

You might want to set the MCA parameter mpi_abort_delay to a delay in
seconds how long Open MPI will wait before killing the job when an
error occurs (negative values mean to never abort; spin forever).
This will allow you to attach a debugger to the application and poke
around and see what value was passed for the communicator, what the
state of the process was, etc.

On Feb 25, 2008, at 4:13 PM, Si Hammond wrote:

> Hi Guys,
>
> We have a very large executable written in C++, C, F77 and F90 (and we
> use all of these compilers!). Our code compiles and links fine but
> when
> we run it on our cluster (under PBSPro) we get the errors at the
> bottom
> of the email.
> I wondered if you guys could shed any light on this? Seems to be an
> odd
> error than MPI_COMM_WORLD is an invalid communicator? Do you think
> its a
> hardware fault or a compilation issue? For reference we're using
> OpenMPI
> 1.2.5 with InfiniBand connected via a Voltaire switch. Processors are
> Intel Dual Core. Compilers are GNU C (gcc), C++ (g++) and gfortran.
>
>
> [node207:12109] *** An error occurred in MPI_Allreduce
> [node109:11337] *** An error occurred in MPI_Allreduce
> [node109:11337] *** on communicator MPI_COMM_WORLD
> [node109:11337] *** MPI_ERR_COMM: invalid communicator
> [node109:11337] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node117:11236] *** An error occurred in MPI_Allreduce
> [node117:11236] *** on communicator MPI_COMM_WORLD
> [node117:11236] *** MPI_ERR_COMM: invalid communicator
> [node117:11236] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node113:11288] *** An error occurred in MPI_Allreduce
> [node113:11288] *** on communicator MPI_COMM_WORLD
> [node113:11288] *** MPI_ERR_COMM: invalid communicator
> [node113:11288] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node111:11295] *** An error occurred in MPI_Allreduce
> [node111:11295] *** on communicator MPI_COMM_WORLD
> [node111:11295] *** MPI_ERR_COMM: invalid communicator
> [node111:11295] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node110:11295] *** An error occurred in MPI_Allreduce
> [node110:11295] *** on communicator MPI_COMM_WORLD
> [node110:11295] *** MPI_ERR_COMM: invalid communicator
> [node110:11295] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node115:11496] *** An error occurred in MPI_Allreduce
> [node115:11496] *** on communicator MPI_COMM_WORLD
> [node115:11496] *** MPI_ERR_COMM: invalid communicator
> [node115:11496] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node118:11239] *** An error occurred in MPI_Allreduce
> [node118:11239] *** on communicator MPI_COMM_WORLD
> [node118:11239] *** MPI_ERR_COMM: invalid communicator
> [node118:11239] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node116:11249] *** An error occurred in MPI_Allreduce
> [node116:11249] *** on communicator MPI_COMM_WORLD
> [node116:11249] *** MPI_ERR_COMM: invalid communicator
> [node116:11249] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node119:11239] *** An error occurred in MPI_Allreduce
> [node119:11239] *** on communicator MPI_COMM_WORLD
> [node119:11239] *** MPI_ERR_COMM: invalid communicator
> [node119:11239] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node207:12109] *** on communicator MPI_COMM_WORLD
> [node207:12109] *** MPI_ERR_COMM: invalid communicator
> [node207:12109] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node114:11261] *** An error occurred in MPI_Allreduce
> [node114:11261] *** on communicator MPI_COMM_WORLD
> [node114:11261] *** MPI_ERR_COMM: invalid communicator
> [node114:11261] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node206:12030] *** An error occurred in MPI_Allreduce
> [node206:12030] *** on communicator MPI_COMM_WORLD
> [node206:12030] *** MPI_ERR_COMM: invalid communicator
> [node206:12030] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node117:11237] *** An error occurred in MPI_Allreduce
> [node113:11287] *** An error occurred in MPI_Allreduce
> [node111:11293] *** An error occurred in MPI_Allreduce
> [node110:11293] *** An error occurred in MPI_Allreduce
> [node110:11293] *** on communicator MPI_COMM_WORLD
> [node110:11293] *** MPI_ERR_COMM: invalid communicator
> [node110:11293] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node115:11495] *** An error occurred in MPI_Allreduce
> [node118:11237] *** An error occurred in MPI_Allreduce
> [node118:11237] *** on communicator MPI_COMM_WORLD
> [node118:11237] *** MPI_ERR_COMM: invalid communicator
> [node118:11237] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node116:11247] *** An error occurred in MPI_Allreduce
> [node116:11247] *** on communicator MPI_COMM_WORLD
> [node116:11247] *** MPI_ERR_COMM: invalid communicator
> [node116:11247] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node119:11238] *** An error occurred in MPI_Allreduce
> [node114:11262] *** An error occurred in MPI_Allreduce
> [node206:12029] *** An error occurred in MPI_Allreduce
> [node206:12029] *** on communicator MPI_COMM_WORLD
> [node206:12029] *** MPI_ERR_COMM: invalid communicator
> [node206:12029] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node117:11238] *** An error occurred in MPI_Allreduce
> [node113:11289] *** An error occurred in MPI_Allreduce
> [node111:11294] *** An error occurred in MPI_Allreduce
> [node110:11294] *** An error occurred in MPI_Allreduce
> [node110:11294] *** on communicator MPI_COMM_WORLD
> [node110:11294] *** MPI_ERR_COMM: invalid communicator
> [node110:11294] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node115:11497] *** An error occurred in MPI_Allreduce
> [node115:11497] *** on communicator MPI_COMM_WORLD
> [node118:11238] *** An error occurred in MPI_Allreduce
> [node118:11238] *** on communicator MPI_COMM_WORLD
> [node118:11238] *** MPI_ERR_COMM: invalid communicator
> [node118:11238] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node116:11248] *** An error occurred in MPI_Allreduce
> [node116:11248] *** on communicator MPI_COMM_WORLD
> [node116:11248] *** MPI_ERR_COMM: invalid communicator
> [node116:11248] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node119:11240] *** An error occurred in MPI_Allreduce
> [node114:11263] *** An error occurred in MPI_Allreduce
> [node114:11263] *** on communicator MPI_COMM_WORLD
> [node114:11263] *** MPI_ERR_COMM: invalid communicator
> [node114:11263] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node206:12031] *** An error occurred in MPI_Allreduce
> [node206:12031] *** on communicator MPI_COMM_WORLD
> [node206:12031] *** MPI_ERR_COMM: invalid communicator
> [node206:12031] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node117:11237] *** on communicator MPI_COMM_WORLD
> [node117:11237] *** MPI_ERR_COMM: invalid communicator
> [node117:11237] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node113:11287] *** on communicator MPI_COMM_WORLD
> [node113:11287] *** MPI_ERR_COMM: invalid communicator
> [node113:11287] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node111:11293] *** on communicator MPI_COMM_WORLD
> [node111:11293] *** MPI_ERR_COMM: invalid communicator
> [node111:11293] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node115:11495] *** on communicator MPI_COMM_WORLD
> [node115:11495] *** MPI_ERR_COMM: invalid communicator
> [node115:11495] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node119:11238] *** on communicator MPI_COMM_WORLD
> [node119:11238] *** MPI_ERR_COMM: invalid communicator
> [node119:11238] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node114:11262] *** on communicator MPI_COMM_WORLD
> [node114:11262] *** MPI_ERR_COMM: invalid communicator
> [node114:11262] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node117:11238] *** on communicator MPI_COMM_WORLD
> [node117:11238] *** MPI_ERR_COMM: invalid communicator
> [node117:11238] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node113:11289] *** on communicator MPI_COMM_WORLD
> [node113:11289] *** MPI_ERR_COMM: invalid communicator
> [node113:11289] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node111:11294] *** on communicator MPI_COMM_WORLD
> [node111:11294] *** MPI_ERR_COMM: invalid communicator
> [node111:11294] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node115:11497] *** MPI_ERR_COMM: invalid communicator
> [node115:11497] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node119:11240] *** on communicator MPI_COMM_WORLD
> [node119:11240] *** MPI_ERR_COMM: invalid communicator
> [node119:11240] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node109:11335] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> base/pls_base_orted_cmds.c at line 275
> [node109:11335] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> pls_tm_module.c
> at line 572
> [node109:11335] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c
> at
> line 90
> [node109:11335] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> base/pls_base_orted_cmds.c at line 188
> [node109:11335] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> pls_tm_module.c
> at line 603
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons for this job.
> Returned value Timeout instead of ORTE_SUCCESS.
> --------------------------------------------------------------------------
> [node117:11235] OOB: Connection to HNP lost
> [node113:11286] OOB: Connection to HNP lost
> [node111:11292] OOB: Connection to HNP lost
> [node115:11494] OOB: Connection to HNP lost
> [node119:11237] OOB: Connection to HNP lost
> [node116:11246] OOB: Connection to HNP lost
> [node206:12028] OOB: Connection to HNP lost
> [node114:11260] OOB: Connection to HNP lost
>
> ----------------------------------------------------------------------------------------------------------
>
> OMPI Info Output
>
> Open MPI: 1.2.5
> Open MPI SVN revision: r16989
> Open RTE: 1.2.5
> Open RTE SVN revision: r16989
> OPAL: 1.2.5
> OPAL SVN revision: r16989
> Prefix: /opt/ompi/1.2.5/gnu/64
> Configured architecture: x86_64-unknown-linux-gnu
> Configured by: root
> Configured on: Sun Jan 20 13:29:39 GMT 2008
> Configure host: mg1
> Built by: root
> Built on: Sun Jan 20 13:37:14 GMT 2008
> Built host: mg1
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (all)
> Fortran90 bindings: yes
> Fortran90 bindings size: small
> C compiler: gcc
> C compiler absolute: /usr/bin/gcc
> C++ compiler: g++
> C++ compiler absolute: /usr/bin/g++
> Fortran77 compiler: gfortran
> Fortran77 compiler abs: /usr/bin/gfortran
> Fortran90 compiler: gfortran
> Fortran90 compiler abs: /usr/bin/gfortran
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: no
> Thread support: posix (mpi: no, progress: no)
> Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> Heterogeneous support: yes
> mpirun default --prefix: no
> MCA backtrace: execinfo (MCA v1.0, API v1.0, Component
> v1.2.5)
> MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
> v1.2.5)
> MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.5)
> MCA maffinity: first_use (MCA v1.0, API v1.0, Component
> v1.2.5)
> MCA maffinity: libnuma (MCA v1.0, API v1.0, Component
> v1.2.5)
> MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.5)
> MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.5)
> MCA installdirs: config (MCA v1.0, API v1.0, Component
> v1.2.5)
> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.5)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.2.5)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.5)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.5)
> MCA io: romio (MCA v1.0, API v1.0, Component v1.2.5)
> MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.5)
> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.5)
> MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.5)
> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.5)
> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.5)
> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.5)
> MCA btl: openib (MCA v1.0, API v1.0.1, Component
> v1.2.5)
> MCA btl: self (MCA v1.0, API v1.0.1, Component
> v1.2.5)
> MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.5)
> MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
> MCA mtl: psm (MCA v1.0, API v1.0, Component v1.2.5)
> MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.5)
> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.5)
> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.5)
> MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.5)
> MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.5)
> MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.5)
> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.5)
> MCA gpr: replica (MCA v1.0, API v1.0, Component
> v1.2.5)
> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.5)
> MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.5)
> MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.5)
> MCA ns: replica (MCA v1.0, API v2.0, Component
> v1.2.5)
> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA ras: dash_host (MCA v1.0, API v1.3, Component
> v1.2.5)
> MCA ras: gridengine (MCA v1.0, API v1.3, Component
> v1.2.5)
> MCA ras: localhost (MCA v1.0, API v1.3, Component
> v1.2.5)
> MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.5)
> MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.5)
> MCA rds: hostfile (MCA v1.0, API v1.3, Component
> v1.2.5)
> MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.5)
> MCA rds: resfile (MCA v1.0, API v1.3, Component
> v1.2.5)
> MCA rmaps: round_robin (MCA v1.0, API v1.3, Component
> v1.2.5)
> MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.5)
> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.5)
> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.5)
> MCA pls: gridengine (MCA v1.0, API v1.3, Component
> v1.2.5)
> MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.5)
> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.5)
> MCA pls: slurm (MCA v1.0, API v1.3, Component v1.2.5)
> MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.5)
> MCA sds: env (MCA v1.0, API v1.0, Component v1.2.5)
> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.5)
> MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.5)
> MCA sds: singleton (MCA v1.0, API v1.0, Component
> v1.2.5)
> MCA sds: slurm (MCA v1.0, API v1.0, Component v1.2.5)
>
>
> --
> Si Hammond
>
> Performance Prediction and Analysis Lab,
> High Performance Systems Group,
> University of Warwick, UK
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems