Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Allgather problem
From: Brett Tully (brett.tully_at_[hidden])
Date: 2012-01-26 05:24:52


As of two days ago, this problem has disappeared and the tests that I had
written and run each night are now passing. Having looked through the
update log of my machine (Ubuntu 11.10) it appears as though I got a new
version of mpi-default-dev (0.6ubuntu1). I would like to understand this
problem in more detail -- is it possible to see what changed in this update?
Thanks,
Brett.

>
> On Fri, Dec 9, 2011 at 6:43 PM, teng ma <tma_at_[hidden]> wrote:
>
>> I guess your output is from different ranks. YOu can add rank infor
>> inside print to tell like follows:
>>
>> (void) printf("rank %d: gathered[%d].node = %d\n", rank, i,
>> gathered[i].node);
>>
>> From my side, I did not see anything wrong from your code in Open MPI
>> 1.4.3. after I add rank, the output is
>> rank 5: gathered[0].node = 0
>> rank 5: gathered[1].node = 1
>> rank 5: gathered[2].node = 2
>> rank 5: gathered[3].node = 3
>> rank 5: gathered[4].node = 4
>> rank 5: gathered[5].node = 5
>> rank 3: gathered[0].node = 0
>> rank 3: gathered[1].node = 1
>> rank 3: gathered[2].node = 2
>> rank 3: gathered[3].node = 3
>> rank 3: gathered[4].node = 4
>> rank 3: gathered[5].node = 5
>> rank 1: gathered[0].node = 0
>> rank 1: gathered[1].node = 1
>> rank 1: gathered[2].node = 2
>> rank 1: gathered[3].node = 3
>> rank 1: gathered[4].node = 4
>> rank 1: gathered[5].node = 5
>> rank 0: gathered[0].node = 0
>> rank 0: gathered[1].node = 1
>> rank 0: gathered[2].node = 2
>> rank 0: gathered[3].node = 3
>> rank 0: gathered[4].node = 4
>> rank 0: gathered[5].node = 5
>> rank 4: gathered[0].node = 0
>> rank 4: gathered[1].node = 1
>> rank 4: gathered[2].node = 2
>> rank 4: gathered[3].node = 3
>> rank 4: gathered[4].node = 4
>> rank 4: gathered[5].node = 5
>> rank 2: gathered[0].node = 0
>> rank 2: gathered[1].node = 1
>> rank 2: gathered[2].node = 2
>> rank 2: gathered[3].node = 3
>> rank 2: gathered[4].node = 4
>> rank 2: gathered[5].node = 5
>>
>> Is that what you expected?
>>
>> On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully <brett.tully_at_[hidden]>wrote:
>>
>>> Dear all,
>>>
>>> I have not used OpenMPI much before, but am maintaining a large legacy
>>> application. We noticed a bug to do with a call to MPI_Allgather as
>>> summarised in this post to Stackoverflow:
>>> http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results
>>>
>>> In the process of looking further into the problem, I noticed that the
>>> following function results in strange behaviour.
>>>
>>> void test_all_gather() {
>>>
>>> struct _TEST_ALL_GATHER {
>>> int node;
>>> };
>>>
>>> int ierr, size, rank;
>>> ierr = MPI_Comm_size(MPI_COMM_WORLD, &size);
>>> ierr = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>
>>> struct _TEST_ALL_GATHER local;
>>> struct _TEST_ALL_GATHER *gathered;
>>>
>>> gathered = (struct _TEST_ALL_GATHER*) malloc(size *
>>> sizeof(*gathered));
>>>
>>> local.node = rank;
>>>
>>> MPI_Allgather(&local, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
>>> gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
>>> MPI_COMM_WORLD);
>>>
>>> int i;
>>> for (i = 0; i < numnodes; ++i) {
>>> (void) printf("gathered[%d].node = %d\n", i, gathered[i].node);
>>> }
>>>
>>> FREE(gathered);
>>> }
>>>
>>> At one point, this function printed the following:
>>> gathered[0].node = 2
>>> gathered[1].node = 3
>>> gathered[2].node = 2
>>> gathered[3].node = 3
>>> gathered[4].node = 4
>>> gathered[5].node = 5
>>>
>>> Can anyone suggest a place to start looking into why this might be
>>> happening? There is a section of the code that calls MPI_Comm_split, but I
>>> am not sure if that is related...
>>>
>>> Running on Ubuntu 11.10 and a summary of ompi_info:
>>> Package: Open MPI buildd_at_allspice Distribution
>>> Open MPI: 1.4.3
>>> Open MPI SVN revision: r23834
>>> Open MPI release date: Oct 05, 2010
>>> Open RTE: 1.4.3
>>> Open RTE SVN revision: r23834
>>> Open RTE release date: Oct 05, 2010
>>> OPAL: 1.4.3
>>> OPAL SVN revision: r23834
>>> OPAL release date: Oct 05, 2010
>>> Ident string: 1.4.3
>>> Prefix: /usr
>>> Configured architecture: x86_64-pc-linux-gnu
>>> Configure host: allspice
>>> Configured by: buildd
>>>
>>> Thanks!
>>> Brett
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> | Teng Ma Univ. of Tennessee |
>> | tma_at_[hidden] Knoxville, TN |
>> | http://web.eecs.utk.edu/~tma/ |
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>