Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] MPI Reduce Error when using C++, C, F77 and F90 in same executable
From: Si Hammond (simon.hammond_at_[hidden])
Date: 2008-02-20 16:58:32


Hi Guys,

We have a very large executable written in C++, C, F77 and F90 (and we
use all of these compilers!). Our code compiles and links fine but when
we run it on our cluster (under PBSPro) we get the errors at the bottom
of the email.

I wondered if you guys could shed any light on this? Seems to be an odd
error than MPI_COMM_WORLD is an invalid communicator? Do you think its a
hardware fault or a compilation issue? For reference we're using OpenMPI
1.2.5 with InfiniBand connected via a Voltaire switch. Processors are
Intel Dual Core. Compilers are GNU C (gcc), C++ (g++) and gfortran.

[node207:12109] *** An error occurred in MPI_Allreduce
[node109:11337] *** An error occurred in MPI_Allreduce
[node109:11337] *** on communicator MPI_COMM_WORLD
[node109:11337] *** MPI_ERR_COMM: invalid communicator
[node109:11337] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node117:11236] *** An error occurred in MPI_Allreduce
[node117:11236] *** on communicator MPI_COMM_WORLD
[node117:11236] *** MPI_ERR_COMM: invalid communicator
[node117:11236] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node113:11288] *** An error occurred in MPI_Allreduce
[node113:11288] *** on communicator MPI_COMM_WORLD
[node113:11288] *** MPI_ERR_COMM: invalid communicator
[node113:11288] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node111:11295] *** An error occurred in MPI_Allreduce
[node111:11295] *** on communicator MPI_COMM_WORLD
[node111:11295] *** MPI_ERR_COMM: invalid communicator
[node111:11295] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node110:11295] *** An error occurred in MPI_Allreduce
[node110:11295] *** on communicator MPI_COMM_WORLD
[node110:11295] *** MPI_ERR_COMM: invalid communicator
[node110:11295] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node115:11496] *** An error occurred in MPI_Allreduce
[node115:11496] *** on communicator MPI_COMM_WORLD
[node115:11496] *** MPI_ERR_COMM: invalid communicator
[node115:11496] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node118:11239] *** An error occurred in MPI_Allreduce
[node118:11239] *** on communicator MPI_COMM_WORLD
[node118:11239] *** MPI_ERR_COMM: invalid communicator
[node118:11239] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node116:11249] *** An error occurred in MPI_Allreduce
[node116:11249] *** on communicator MPI_COMM_WORLD
[node116:11249] *** MPI_ERR_COMM: invalid communicator
[node116:11249] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node119:11239] *** An error occurred in MPI_Allreduce
[node119:11239] *** on communicator MPI_COMM_WORLD
[node119:11239] *** MPI_ERR_COMM: invalid communicator
[node119:11239] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node207:12109] *** on communicator MPI_COMM_WORLD
[node207:12109] *** MPI_ERR_COMM: invalid communicator
[node207:12109] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node114:11261] *** An error occurred in MPI_Allreduce
[node114:11261] *** on communicator MPI_COMM_WORLD
[node114:11261] *** MPI_ERR_COMM: invalid communicator
[node114:11261] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node206:12030] *** An error occurred in MPI_Allreduce
[node206:12030] *** on communicator MPI_COMM_WORLD
[node206:12030] *** MPI_ERR_COMM: invalid communicator
[node206:12030] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node117:11237] *** An error occurred in MPI_Allreduce
[node113:11287] *** An error occurred in MPI_Allreduce
[node111:11293] *** An error occurred in MPI_Allreduce
[node110:11293] *** An error occurred in MPI_Allreduce
[node110:11293] *** on communicator MPI_COMM_WORLD
[node110:11293] *** MPI_ERR_COMM: invalid communicator
[node110:11293] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node115:11495] *** An error occurred in MPI_Allreduce
[node118:11237] *** An error occurred in MPI_Allreduce
[node118:11237] *** on communicator MPI_COMM_WORLD
[node118:11237] *** MPI_ERR_COMM: invalid communicator
[node118:11237] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node116:11247] *** An error occurred in MPI_Allreduce
[node116:11247] *** on communicator MPI_COMM_WORLD
[node116:11247] *** MPI_ERR_COMM: invalid communicator
[node116:11247] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node119:11238] *** An error occurred in MPI_Allreduce
[node114:11262] *** An error occurred in MPI_Allreduce
[node206:12029] *** An error occurred in MPI_Allreduce
[node206:12029] *** on communicator MPI_COMM_WORLD
[node206:12029] *** MPI_ERR_COMM: invalid communicator
[node206:12029] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node117:11238] *** An error occurred in MPI_Allreduce
[node113:11289] *** An error occurred in MPI_Allreduce
[node111:11294] *** An error occurred in MPI_Allreduce
[node110:11294] *** An error occurred in MPI_Allreduce
[node110:11294] *** on communicator MPI_COMM_WORLD
[node110:11294] *** MPI_ERR_COMM: invalid communicator
[node110:11294] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node115:11497] *** An error occurred in MPI_Allreduce
[node115:11497] *** on communicator MPI_COMM_WORLD
[node118:11238] *** An error occurred in MPI_Allreduce
[node118:11238] *** on communicator MPI_COMM_WORLD
[node118:11238] *** MPI_ERR_COMM: invalid communicator
[node118:11238] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node116:11248] *** An error occurred in MPI_Allreduce
[node116:11248] *** on communicator MPI_COMM_WORLD
[node116:11248] *** MPI_ERR_COMM: invalid communicator
[node116:11248] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node119:11240] *** An error occurred in MPI_Allreduce
[node114:11263] *** An error occurred in MPI_Allreduce
[node114:11263] *** on communicator MPI_COMM_WORLD
[node114:11263] *** MPI_ERR_COMM: invalid communicator
[node114:11263] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node206:12031] *** An error occurred in MPI_Allreduce
[node206:12031] *** on communicator MPI_COMM_WORLD
[node206:12031] *** MPI_ERR_COMM: invalid communicator
[node206:12031] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node117:11237] *** on communicator MPI_COMM_WORLD
[node117:11237] *** MPI_ERR_COMM: invalid communicator
[node117:11237] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node113:11287] *** on communicator MPI_COMM_WORLD
[node113:11287] *** MPI_ERR_COMM: invalid communicator
[node113:11287] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node111:11293] *** on communicator MPI_COMM_WORLD
[node111:11293] *** MPI_ERR_COMM: invalid communicator
[node111:11293] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node115:11495] *** on communicator MPI_COMM_WORLD
[node115:11495] *** MPI_ERR_COMM: invalid communicator
[node115:11495] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node119:11238] *** on communicator MPI_COMM_WORLD
[node119:11238] *** MPI_ERR_COMM: invalid communicator
[node119:11238] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node114:11262] *** on communicator MPI_COMM_WORLD
[node114:11262] *** MPI_ERR_COMM: invalid communicator
[node114:11262] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node117:11238] *** on communicator MPI_COMM_WORLD
[node117:11238] *** MPI_ERR_COMM: invalid communicator
[node117:11238] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node113:11289] *** on communicator MPI_COMM_WORLD
[node113:11289] *** MPI_ERR_COMM: invalid communicator
[node113:11289] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node111:11294] *** on communicator MPI_COMM_WORLD
[node111:11294] *** MPI_ERR_COMM: invalid communicator
[node111:11294] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node115:11497] *** MPI_ERR_COMM: invalid communicator
[node115:11497] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node119:11240] *** on communicator MPI_COMM_WORLD
[node119:11240] *** MPI_ERR_COMM: invalid communicator
[node119:11240] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node109:11335] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[node109:11335] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_tm_module.c
at line 572
[node109:11335] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at
line 90
[node109:11335] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[node109:11335] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_tm_module.c
at line 603
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.
--------------------------------------------------------------------------
[node117:11235] OOB: Connection to HNP lost
[node113:11286] OOB: Connection to HNP lost
[node111:11292] OOB: Connection to HNP lost
[node115:11494] OOB: Connection to HNP lost
[node119:11237] OOB: Connection to HNP lost
[node116:11246] OOB: Connection to HNP lost
[node206:12028] OOB: Connection to HNP lost
[node114:11260] OOB: Connection to HNP lost

-- 
Si Hammond
Performance Prediction and Analysis Lab,
High Performance Systems Group,
University of Warwick, UK