Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Allgather problem
From: Brett Tully (brett.tully_at_[hidden])
Date: 2012-01-27 05:02:39


Interesting. In the same set of updates, I installed OpenFOAM from their
Ubuntu deb package and it claims to ship with openmpi. I just downloaded
their Third-party source tar and unzipped it to see what version of openmpi
they are using, and it is 1.5.3. However, when I do man openmpi, or
ompi_info, I get the same version as before (1.4.3). How do I determine for
sure what is being included when I compile something using mpicc?

Thanks,
Brett.

On Thu, Jan 26, 2012 at 10:05 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:

> What version did you upgrade to? (we don't control the Ubuntu packaging)
>
> I see a bullet in the soon-to-be-released 1.4.5 release notes:
>
> - Fix obscure cases where MPI_ALLGATHER could crash. Thanks to Andrew
> Senin for reporting the problem.
>
> But that would be surprising if this is what fixed your issue, especially
> since it's not released yet. :-)
>
>
>
> On Jan 26, 2012, at 5:24 AM, Brett Tully wrote:
>
> > As of two days ago, this problem has disappeared and the tests that I
> had written and run each night are now passing. Having looked through the
> update log of my machine (Ubuntu 11.10) it appears as though I got a new
> version of mpi-default-dev (0.6ubuntu1). I would like to understand this
> problem in more detail -- is it possible to see what changed in this update?
> > Thanks,
> > Brett.
> >
> >
> >
> > On Fri, Dec 9, 2011 at 6:43 PM, teng ma <tma_at_[hidden]> wrote:
> > I guess your output is from different ranks. YOu can add rank infor
> inside print to tell like follows:
> >
> > (void) printf("rank %d: gathered[%d].node = %d\n", rank, i,
> gathered[i].node);
> >
> > From my side, I did not see anything wrong from your code in Open MPI
> 1.4.3. after I add rank, the output is
> > rank 5: gathered[0].node = 0
> > rank 5: gathered[1].node = 1
> > rank 5: gathered[2].node = 2
> > rank 5: gathered[3].node = 3
> > rank 5: gathered[4].node = 4
> > rank 5: gathered[5].node = 5
> > rank 3: gathered[0].node = 0
> > rank 3: gathered[1].node = 1
> > rank 3: gathered[2].node = 2
> > rank 3: gathered[3].node = 3
> > rank 3: gathered[4].node = 4
> > rank 3: gathered[5].node = 5
> > rank 1: gathered[0].node = 0
> > rank 1: gathered[1].node = 1
> > rank 1: gathered[2].node = 2
> > rank 1: gathered[3].node = 3
> > rank 1: gathered[4].node = 4
> > rank 1: gathered[5].node = 5
> > rank 0: gathered[0].node = 0
> > rank 0: gathered[1].node = 1
> > rank 0: gathered[2].node = 2
> > rank 0: gathered[3].node = 3
> > rank 0: gathered[4].node = 4
> > rank 0: gathered[5].node = 5
> > rank 4: gathered[0].node = 0
> > rank 4: gathered[1].node = 1
> > rank 4: gathered[2].node = 2
> > rank 4: gathered[3].node = 3
> > rank 4: gathered[4].node = 4
> > rank 4: gathered[5].node = 5
> > rank 2: gathered[0].node = 0
> > rank 2: gathered[1].node = 1
> > rank 2: gathered[2].node = 2
> > rank 2: gathered[3].node = 3
> > rank 2: gathered[4].node = 4
> > rank 2: gathered[5].node = 5
> >
> > Is that what you expected?
> >
> > On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully <brett.tully_at_[hidden]>
> wrote:
> > Dear all,
> >
> > I have not used OpenMPI much before, but am maintaining a large legacy
> application. We noticed a bug to do with a call to MPI_Allgather as
> summarised in this post to Stackoverflow:
> http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results
> >
> > In the process of looking further into the problem, I noticed that the
> following function results in strange behaviour.
> >
> > void test_all_gather() {
> >
> > struct _TEST_ALL_GATHER {
> > int node;
> > };
> >
> > int ierr, size, rank;
> > ierr = MPI_Comm_size(MPI_COMM_WORLD, &size);
> > ierr = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> >
> > struct _TEST_ALL_GATHER local;
> > struct _TEST_ALL_GATHER *gathered;
> >
> > gathered = (struct _TEST_ALL_GATHER*) malloc(size *
> sizeof(*gathered));
> >
> > local.node = rank;
> >
> > MPI_Allgather(&local, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
> > gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
> MPI_COMM_WORLD);
> >
> > int i;
> > for (i = 0; i < numnodes; ++i) {
> > (void) printf("gathered[%d].node = %d\n", i, gathered[i].node);
> > }
> >
> > FREE(gathered);
> > }
> >
> > At one point, this function printed the following:
> > gathered[0].node = 2
> > gathered[1].node = 3
> > gathered[2].node = 2
> > gathered[3].node = 3
> > gathered[4].node = 4
> > gathered[5].node = 5
> >
> > Can anyone suggest a place to start looking into why this might be
> happening? There is a section of the code that calls MPI_Comm_split, but I
> am not sure if that is related...
> >
> > Running on Ubuntu 11.10 and a summary of ompi_info:
> > Package: Open MPI buildd_at_allspice Distribution
> > Open MPI: 1.4.3
> > Open MPI SVN revision: r23834
> > Open MPI release date: Oct 05, 2010
> > Open RTE: 1.4.3
> > Open RTE SVN revision: r23834
> > Open RTE release date: Oct 05, 2010
> > OPAL: 1.4.3
> > OPAL SVN revision: r23834
> > OPAL release date: Oct 05, 2010
> > Ident string: 1.4.3
> > Prefix: /usr
> > Configured architecture: x86_64-pc-linux-gnu
> > Configure host: allspice
> > Configured by: buildd
> >
> > Thanks!
> > Brett
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > --
> > | Teng Ma Univ. of Tennessee |
> > | tma_at_[hidden] Knoxville, TN |
> > | http://web.eecs.utk.edu/~tma/ |
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>