Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bug in my code or in v1.4.3?
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-01-10 09:15:20


The test code looks ok to me.

I will mention that Open MPI 1.4.3 is *very* old; it is now 2 generations behind the current. The current stable release is 1.6.5, and the current feature series (1.7.x) is likely to transition to stable (1.8.x) in a few months. I don't follow Ubuntu at all, but I guess I'm a bit surprised that a) they're so far out of date, and b) they don't even have the last release of the Open MPI 1.4.x series (which was 1.4.5, released Feb 14, 2012).

So yes, it could be a bug in Open MPI -- it's really hard to say with a version that old. I would say that the first step is upgrading to at least Open MPI 1.4.5 -- 1.6.5, if possible.

On Jan 10, 2014, at 5:49 AM, David Froger <david.froger_at_[hidden]> wrote:

> Dear all,
>
> We are migrating a code using OpenMPI from Ubuntu 10.04 to Ubuntu 12.04, and
> encouter some problems.
>
> Bellow is a test code that work on Ubuntu 10.04, but fails on Ubuntu 12.04
>
> The question is: is there a bug in the test code, or is it due to a bug in
> OpenMPI?
>
> Thanks for any help,
> David
>
> ==============================================================================
> OpenMPI versions
> ==============================================================================
>
> We use the default OpenMPI versions on both version of Ubuntu:
>
> $ apt-cache policy openmpi-bin # On Ubuntu 10.04
> openmpi-bin:
> Installed: 1.4.1-2
> Candidate: 1.4.1-2
> Version table:
> *** 1.4.1-2 0
> 500 http://ubuntu.lucid.miroir.rocq.inria.fr/ lucid/universe Packages
> 100 /var/lib/dpkg/status
>
> $ apt-cache policy openmpi-bin # On Ubuntu 12.04
> openmpi-bin:
> Installed: 1.4.3-2.1ubuntu3
> Candidate: 1.4.3-2.1ubuntu3
> Version table:
> *** 1.4.3-2.1ubuntu3 0
> 500 http://ubuntu.precise.miroir.rocq.inria.fr/ precise/universe amd64 Packages
> 100 /var/lib/dpkg/status
>
> ==============================================================================
> Error messages
> ==============================================================================
>
> The test code given bellow is working on Ubuntu 10.04, but sometimes fails on
> 12.04, with the folling output for example:
>
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 10 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> Error rank 10 tab[0] = 8
> Error rank 11 tab[0] = 7
> Error rank 12 tab[0] = 6
> Error rank 13 tab[0] = 10
> Error rank 14 tab[2] = 10
> --------------------------------------------------------------------------
> mpiexec has exited due to process rank 10 with PID 10284 on
> node saphene exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpiexec (as reported here).
> --------------------------------------------------------------------------
> [saphene:10273] 4 more processes have sent help message help-mpi-api.txt / mpi-abort
> [saphene:10273] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
>
> ==============================================================================
> Test code
> ==============================================================================
>
> Here is the code:
>
> #include <iostream>
> #include <mpi.h>
>
> using namespace std;
>
> int main(int argc, char** argv)
> {
> int ierr;
> ierr = MPI_Init(&argc, &argv);
>
> if(ierr != MPI_SUCCESS){
> cout << "Error initializing mpi" << endl;
> MPI_Abort(MPI_COMM_WORLD, ierr);
> }
>
> // get the number of process
> int numProcess;
> MPI_Comm_size(MPI_COMM_WORLD, &numProcess);
>
> // get the rank of the process
> int rank;
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>
> for(int it=0; it<20; it++){
> // gather all rank in an array
> int *tab = new int[numProcess];
> ierr = MPI_Allgather(&rank, 1, MPI_INT, tab, 1, MPI_INT, MPI_COMM_WORLD);
>
> if(ierr != MPI_SUCCESS){
> cout << "Error MPI_Allgather rank:" << rank << endl;
> MPI_Abort(MPI_COMM_WORLD, ierr);
> }
>
> // check that everything is ok
> for(int i=0; i<numProcess; i++){
> if(tab[i] != i){
> cout << "Error rank " << rank << " tab[" << i << "] = " << tab[i] << endl;
> MPI_Abort(MPI_COMM_WORLD, 1);
> }
> }
> delete [] tab;
> }
>
> MPI_Finalize();
> cout << "Exit normally" << endl;
> return 0;
> }
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/