Yes, your output is what I was expecting. Actually, your output is what I get if I compile the code I attached in my first email. However, our application is actually doing some 'smart' stuff when you dynamically allocate memory by putting headers around the memory block -- I am guessing that this can interfere with MPI_Allgather(). What is strange is that this problem doesn't surface on the other machine that we are working with (OpenSUSE) nor does it appear if we run it with valgrind. This is probably a dumb question, but if you were to see this problem, where is the first place your gut would tell you to look?
Thanks,
Brett.

On Fri, Dec 9, 2011 at 6:43 PM, teng ma <tma@eecs.utk.edu> wrote:
I guess your output is from different ranks.   YOu can add rank infor inside print to tell like follows:

(void) printf("rank %d: gathered[%d].node = %d\n", rank, i, gathered[i].node);

From my side, I did not see anything wrong from your code in Open MPI 1.4.3. after I add rank, the output is
rank 5: gathered[0].node = 0
rank 5: gathered[1].node = 1
rank 5: gathered[2].node = 2
rank 5: gathered[3].node = 3
rank 5: gathered[4].node = 4
rank 5: gathered[5].node = 5
rank 3: gathered[0].node = 0
rank 3: gathered[1].node = 1
rank 3: gathered[2].node = 2
rank 3: gathered[3].node = 3
rank 3: gathered[4].node = 4
rank 3: gathered[5].node = 5
rank 1: gathered[0].node = 0
rank 1: gathered[1].node = 1
rank 1: gathered[2].node = 2
rank 1: gathered[3].node = 3
rank 1: gathered[4].node = 4
rank 1: gathered[5].node = 5
rank 0: gathered[0].node = 0
rank 0: gathered[1].node = 1
rank 0: gathered[2].node = 2
rank 0: gathered[3].node = 3
rank 0: gathered[4].node = 4
rank 0: gathered[5].node = 5
rank 4: gathered[0].node = 0
rank 4: gathered[1].node = 1
rank 4: gathered[2].node = 2
rank 4: gathered[3].node = 3
rank 4: gathered[4].node = 4
rank 4: gathered[5].node = 5
rank 2: gathered[0].node = 0
rank 2: gathered[1].node = 1
rank 2: gathered[2].node = 2
rank 2: gathered[3].node = 3
rank 2: gathered[4].node = 4
rank 2: gathered[5].node = 5

Is that what you expected?

On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully <brett.tully@oxyntix.com> wrote:
Dear all,

I have not used OpenMPI much before, but am maintaining a large legacy application. We noticed a bug to do with a call to MPI_Allgather as summarised in this post to Stackoverflow: http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results

In the process of looking further into the problem, I noticed that the following function results in strange behaviour.

void test_all_gather() {

    struct _TEST_ALL_GATHER {
        int node;
    };

    int ierr, size, rank;
    ierr = MPI_Comm_size(MPI_COMM_WORLD, &size);
    ierr = MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    struct _TEST_ALL_GATHER local;
    struct _TEST_ALL_GATHER *gathered;

    gathered = (struct _TEST_ALL_GATHER*) malloc(size * sizeof(*gathered));

    local.node = rank;

    MPI_Allgather(&local, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE, 
        gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE, MPI_COMM_WORLD);

    int i;
    for (i = 0; i < numnodes; ++i) {
        (void) printf("gathered[%d].node = %d\n", i, gathered[i].node);
    }

    FREE(gathered);
}

At one point, this function printed the following:
gathered[0].node = 2
gathered[1].node = 3
gathered[2].node = 2
gathered[3].node = 3
gathered[4].node = 4
gathered[5].node = 5

Can anyone suggest a place to start looking into why this might be happening? There is a section of the code that calls MPI_Comm_split, but I am not sure if that is related...

Running on Ubuntu 11.10 and a summary of ompi_info:
Package: Open MPI buildd@allspice Distribution
Open MPI: 1.4.3
Open MPI SVN revision: r23834
Open MPI release date: Oct 05, 2010
Open RTE: 1.4.3
Open RTE SVN revision: r23834
Open RTE release date: Oct 05, 2010
OPAL: 1.4.3
OPAL SVN revision: r23834
OPAL release date: Oct 05, 2010
Ident string: 1.4.3
Prefix: /usr
Configured architecture: x86_64-pc-linux-gnu
Configure host: allspice
Configured by: buildd

Thanks!
Brett

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
| Teng Ma          Univ. of Tennessee |
| tma@cs.utk.edu        Knoxville, TN |
| http://web.eecs.utk.edu/~tma/       |


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users