Looking at the change log for 1.5.1 I see:
- Use memmove (instead of memcpy) when necessary (e.g., source and destination overlap).

It seems as though this might be a likely candidate for a change that might fix my problems if I am indeed using 1.5.3 following the installation of OpenFOAM?

On Fri, Jan 27, 2012 at 10:02 AM, Brett Tully wrote:
Interesting. In the same set of updates, I installed OpenFOAM from their Ubuntu deb package and it claims to ship with openmpi. I just downloaded their Third-party source tar and unzipped it to see what version of openmpi they are using, and it is 1.5.3. However, when I do man openmpi, or ompi_info, I get the same version as before (1.4.3). How do I determine for sure what is being included when I compile something using mpicc?

Thanks,
Brett.



On Thu, Jan 26, 2012 at 10:05 PM, Jeff Squyres <jsquyres@cisco.com> wrote:
What version did you upgrade to?  (we don't control the Ubuntu packaging)

I see a bullet in the soon-to-be-released 1.4.5 release notes:

- Fix obscure cases where MPI_ALLGATHER could crash.  Thanks to Andrew
 Senin for reporting the problem.

But that would be surprising if this is what fixed your issue, especially since it's not released yet.  :-)



On Jan 26, 2012, at 5:24 AM, Brett Tully wrote:

> As of two days ago, this problem has disappeared and the tests that I had written and run each night are now passing. Having looked through the update log of my machine (Ubuntu 11.10) it appears as though I got a new version of mpi-default-dev (0.6ubuntu1). I would like to understand this problem in more detail -- is it possible to see what changed in this update?
> Thanks,
> Brett.
>
>
>
> On Fri, Dec 9, 2011 at 6:43 PM, teng ma <tma@eecs.utk.edu> wrote:
> I guess your output is from different ranks.   YOu can add rank infor inside print to tell like follows:
>
> (void) printf("rank %d: gathered[%d].node = %d\n", rank, i, gathered[i].node);
>
> From my side, I did not see anything wrong from your code in Open MPI 1.4.3. after I add rank, the output is
> rank 5: gathered[0].node = 0
> rank 5: gathered[1].node = 1
> rank 5: gathered[2].node = 2
> rank 5: gathered[3].node = 3
> rank 5: gathered[4].node = 4
> rank 5: gathered[5].node = 5
> rank 3: gathered[0].node = 0
> rank 3: gathered[1].node = 1
> rank 3: gathered[2].node = 2
> rank 3: gathered[3].node = 3
> rank 3: gathered[4].node = 4
> rank 3: gathered[5].node = 5
> rank 1: gathered[0].node = 0
> rank 1: gathered[1].node = 1
> rank 1: gathered[2].node = 2
> rank 1: gathered[3].node = 3
> rank 1: gathered[4].node = 4
> rank 1: gathered[5].node = 5
> rank 0: gathered[0].node = 0
> rank 0: gathered[1].node = 1
> rank 0: gathered[2].node = 2
> rank 0: gathered[3].node = 3
> rank 0: gathered[4].node = 4
> rank 0: gathered[5].node = 5
> rank 4: gathered[0].node = 0
> rank 4: gathered[1].node = 1
> rank 4: gathered[2].node = 2
> rank 4: gathered[3].node = 3
> rank 4: gathered[4].node = 4
> rank 4: gathered[5].node = 5
> rank 2: gathered[0].node = 0
> rank 2: gathered[1].node = 1
> rank 2: gathered[2].node = 2
> rank 2: gathered[3].node = 3
> rank 2: gathered[4].node = 4
> rank 2: gathered[5].node = 5
>
> Is that what you expected?
>
> On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully <brett.tully@oxyntix.com> wrote:
> Dear all,
>
> I have not used OpenMPI much before, but am maintaining a large legacy application. We noticed a bug to do with a call to MPI_Allgather as summarised in this post to Stackoverflow: http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results
>
> In the process of looking further into the problem, I noticed that the following function results in strange behaviour.
>
> void test_all_gather() {
>
>     struct _TEST_ALL_GATHER {
>         int node;
>     };
>
>     int ierr, size, rank;
>     ierr = MPI_Comm_size(MPI_COMM_WORLD, &size);
>     ierr = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>
>     struct _TEST_ALL_GATHER local;
>     struct _TEST_ALL_GATHER *gathered;
>
>     gathered = (struct _TEST_ALL_GATHER*) malloc(size * sizeof(*gathered));
>
>     local.node = rank;
>
>     MPI_Allgather(&local, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
>         gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE, MPI_COMM_WORLD);
>
>     int i;
>     for (i = 0; i < numnodes; ++i) {
>         (void) printf("gathered[%d].node = %d\n", i, gathered[i].node);
>     }
>
>     FREE(gathered);
> }
>
> At one point, this function printed the following:
> gathered[0].node = 2
> gathered[1].node = 3
> gathered[2].node = 2
> gathered[3].node = 3
> gathered[4].node = 4
> gathered[5].node = 5
>
> Can anyone suggest a place to start looking into why this might be happening? There is a section of the code that calls MPI_Comm_split, but I am not sure if that is related...
>
> Running on Ubuntu 11.10 and a summary of ompi_info:
> Package: Open MPI buildd@allspice Distribution
> Open MPI: 1.4.3
> Open MPI SVN revision: r23834
> Open MPI release date: Oct 05, 2010
> Open RTE: 1.4.3
> Open RTE SVN revision: r23834
> Open RTE release date: Oct 05, 2010
> OPAL: 1.4.3
> OPAL SVN revision: r23834
> OPAL release date: Oct 05, 2010
> Ident string: 1.4.3
> Prefix: /usr
> Configured architecture: x86_64-pc-linux-gnu
> Configure host: allspice
> Configured by: buildd
>
> Thanks!
> Brett
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> | Teng Ma          Univ. of Tennessee |
> | tma@cs.utk.edu        Knoxville, TN |
> | http://web.eecs.utk.edu/~tma/       |
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquyres@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users