Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] overlapping memcpy in ompi_coll_tuned_allgather_intra_bruck
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-02-04 18:31:56


On Feb 4, 2008, at 11:56 AM, Number Cruncher wrote:

> George Bosilca wrote:
>>
>> Now, the overlapping case is a real exception. Obviously, it happened
>> for at least two peoples (as per mailing list search) in about 4
>> years,
>> but without affecting the correctness of the application. Is that a
>> reason good enough to effect the overall performance of all parallel
>> applications using Open MPI ? You can already guess my stance.
>>
>
> Thanks for the reply. I agree with your pragmatic approach in general,
> and the lack of widespread problems certainly puts this low priority.
> However, there *is* a reason for the memmove/memcpy distinction,
> otherwise there'd only be a single API point in libc. And, as you
> state,
> that reason is performance. One day someone will write some optimized
> memcpy that *isn't* a simple forward copy.
>
> I'm old enough to remember the Z80 instructions LDDR and LDIR
> (http://www.sincuser.f9.co.uk/044/mcode.htm) for assembly-level memory
> copying. A memmove would have to choose between the two; memcpy could
> legitimately use either and would corrupt overlapping memory 50% of
> the
> time.

I did start with the Z80 too ... but now it looks like it was in the
"ice age" :)

>> However, I can imagine a way to rewrite the last step of the bruck
>> algorithm to avoid this problem, and without affecting the overall
>> performance.
>
> Totally agree. The vast majority of OpenMPI stuff uses memcpy fine. It
> would just be a local bug fix. Can I volunteer?

Of course, feel free to join the fun. Here is what I had in mind. The
final step in the bruck algorithm can be completely discarded for the
first half of the processes, if we compute the receive buffer smartly.
For the other half, I guess we can do the copy one non overlapping
piece of data at the time, eventually without the need or an
additional buffer.

   Thanks,
     george.

>
>
> Regards,
> Simon
>>
>> Thanks,
>> George.
>>
>> On Jan 30, 2008, at 9:41 AM, Number Cruncher wrote:
>>
>>> I'm getting many "Source and destination overlap in memcpy" errors
>>> when
>>> running my application on an odd number of procs.
>>>
>>> I believe this is because the Allgather collective is using Bruck's
>>> algorithm and doing a shift on the buffer as a finalisation step
>>> (coll_tuned_allgather.c):
>>>
>>> tmprecv = (char*) rbuf;
>>> tmpsend = (char*) rbuf + (size - rank) * rcount * rext;
>>>
>>> err = ompi_ddt_copy_content_same_ddt(rdtype, rank * rcount,
>>> tmprecv, tmpsend);
>>>
>>> Unfortunately ompi_ddt_copy_content_same_ddt does a memcpy,
>>> instead of
>>> the memmove which is needed here. For this buffer-left-shift, any
>>> forward-copying memcpy should actually be OK as it won't overwrite
>>> itself during the copy, but this violates the precondition of
>>> memcpy and
>>> may break for some implementations.
>>>
>>> I think this issue was dismissed too lightly previously:
>>> http://www.open-mpi.org/community/lists/users/2007/08/3873.php
>>>
>>> Thanks,
>>> Simon
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users



  • application/pkcs7-signature attachment: smime.p7s