Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Allgather problem
From: Brett Tully (brett.tully_at_[hidden])
Date: 2011-12-13 13:18:14


Yes, your output is what I was expecting. Actually, your output is what I
get if I compile the code I attached in my first email. However, our
application is actually doing some 'smart' stuff when you dynamically
allocate memory by putting headers around the memory block -- I am guessing
that this can interfere with MPI_Allgather(). What is strange is that this
problem doesn't surface on the other machine that we are working with
(OpenSUSE) nor does it appear if we run it with valgrind. This is probably
a dumb question, but if you were to see this problem, where is the first
place your gut would tell you to look?
Thanks,
Brett.

On Fri, Dec 9, 2011 at 6:43 PM, teng ma <tma_at_[hidden]> wrote:

> I guess your output is from different ranks. YOu can add rank infor
> inside print to tell like follows:
>
> (void) printf("rank %d: gathered[%d].node = %d\n", rank, i,
> gathered[i].node);
>
> From my side, I did not see anything wrong from your code in Open MPI
> 1.4.3. after I add rank, the output is
> rank 5: gathered[0].node = 0
> rank 5: gathered[1].node = 1
> rank 5: gathered[2].node = 2
> rank 5: gathered[3].node = 3
> rank 5: gathered[4].node = 4
> rank 5: gathered[5].node = 5
> rank 3: gathered[0].node = 0
> rank 3: gathered[1].node = 1
> rank 3: gathered[2].node = 2
> rank 3: gathered[3].node = 3
> rank 3: gathered[4].node = 4
> rank 3: gathered[5].node = 5
> rank 1: gathered[0].node = 0
> rank 1: gathered[1].node = 1
> rank 1: gathered[2].node = 2
> rank 1: gathered[3].node = 3
> rank 1: gathered[4].node = 4
> rank 1: gathered[5].node = 5
> rank 0: gathered[0].node = 0
> rank 0: gathered[1].node = 1
> rank 0: gathered[2].node = 2
> rank 0: gathered[3].node = 3
> rank 0: gathered[4].node = 4
> rank 0: gathered[5].node = 5
> rank 4: gathered[0].node = 0
> rank 4: gathered[1].node = 1
> rank 4: gathered[2].node = 2
> rank 4: gathered[3].node = 3
> rank 4: gathered[4].node = 4
> rank 4: gathered[5].node = 5
> rank 2: gathered[0].node = 0
> rank 2: gathered[1].node = 1
> rank 2: gathered[2].node = 2
> rank 2: gathered[3].node = 3
> rank 2: gathered[4].node = 4
> rank 2: gathered[5].node = 5
>
> Is that what you expected?
>
> On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully <brett.tully_at_[hidden]>wrote:
>
>> Dear all,
>>
>> I have not used OpenMPI much before, but am maintaining a large legacy
>> application. We noticed a bug to do with a call to MPI_Allgather as
>> summarised in this post to Stackoverflow:
>> http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results
>>
>> In the process of looking further into the problem, I noticed that the
>> following function results in strange behaviour.
>>
>> void test_all_gather() {
>>
>> struct _TEST_ALL_GATHER {
>> int node;
>> };
>>
>> int ierr, size, rank;
>> ierr = MPI_Comm_size(MPI_COMM_WORLD, &size);
>> ierr = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>
>> struct _TEST_ALL_GATHER local;
>> struct _TEST_ALL_GATHER *gathered;
>>
>> gathered = (struct _TEST_ALL_GATHER*) malloc(size *
>> sizeof(*gathered));
>>
>> local.node = rank;
>>
>> MPI_Allgather(&local, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
>> gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
>> MPI_COMM_WORLD);
>>
>> int i;
>> for (i = 0; i < numnodes; ++i) {
>> (void) printf("gathered[%d].node = %d\n", i, gathered[i].node);
>> }
>>
>> FREE(gathered);
>> }
>>
>> At one point, this function printed the following:
>> gathered[0].node = 2
>> gathered[1].node = 3
>> gathered[2].node = 2
>> gathered[3].node = 3
>> gathered[4].node = 4
>> gathered[5].node = 5
>>
>> Can anyone suggest a place to start looking into why this might be
>> happening? There is a section of the code that calls MPI_Comm_split, but I
>> am not sure if that is related...
>>
>> Running on Ubuntu 11.10 and a summary of ompi_info:
>> Package: Open MPI buildd_at_allspice Distribution
>> Open MPI: 1.4.3
>> Open MPI SVN revision: r23834
>> Open MPI release date: Oct 05, 2010
>> Open RTE: 1.4.3
>> Open RTE SVN revision: r23834
>> Open RTE release date: Oct 05, 2010
>> OPAL: 1.4.3
>> OPAL SVN revision: r23834
>> OPAL release date: Oct 05, 2010
>> Ident string: 1.4.3
>> Prefix: /usr
>> Configured architecture: x86_64-pc-linux-gnu
>> Configure host: allspice
>> Configured by: buildd
>>
>> Thanks!
>> Brett
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> | Teng Ma Univ. of Tennessee |
> | tma_at_[hidden] Knoxville, TN |
> | http://web.eecs.utk.edu/~tma/ |
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>