Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] possible bug exercised by mpi4py
From: Lisandro Dalcin (dalcinl_at_[hidden])
Date: 2012-05-24 11:57:36


On 24 May 2012 12:40, George Bosilca <bosilca_at_[hidden]> wrote:
> On May 24, 2012, at 11:22 , Jeff Squyres wrote:
>
>> On May 24, 2012, at 11:10 AM, Lisandro Dalcin wrote:
>>
>>>> So I checked them all, and I found SCATTERV, GATHERV, and REDUCE_SCATTER all had the issue.  Now fixed on the trunk, and will be in 1.6.1.
>>>
>>> Please be careful with REDUCE_SCATTER[_BLOCK] . My understanding of
>>> the MPI standard is that the the length of the recvcounts array is the
>>> local group size
>>> (http://www.mpi-forum.org/docs/mpi22-report/node113.htm#Node113)
>>
>>
>> I read that this morning and it made my head hurt.
>>
>> I read it to be: reduce the data in the local group, scatter the results to the remote group.
>>
>> As such, the reduce COUNT is sum(recvcounts), and is used for the reduction in the local group.  Then use recvcounts to scatter it to the remote group.
>>
>> …right?
>
> Right, you reduce locally but you scatter remotely. As such the size of the recvcounts buffer is the remote size. As in the local group you do a reduce (where every process participate with the same amount of data) you only need a total count which in this case is the sum of all recvcounts. This requirement is enforced by the fact that the input buffer is of size sum of all recvcounts, which make sense only if you know the remote group receives counts.

The standard says this:

"Within each group, all processes provide the same recvcounts
argument, and provide input vectors of sum_i^n recvcounts[i] elements
stored in the send buffers, where n is the size of the group"

So, I read " Within each group, ... where n is the size of the group"
as being the LOCAL group size.

>
> I don't see much difference with the other collective. The generic behavior is that you apply the operation on the local group but the result is moved into the remote group.
>

Well, for me this one DO IS different (for example, SCATTER is
unidirectional for intercomunicators, but REDUCE_SCATTER is
bidirectional). The "recvbuff" is a local buffer, but you understand
"recvcounts" as remote.

Mmm, the standard is really confusing in this point...

-- 
Lisandro Dalcin
---------------
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169