Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Memchecker and Wait
From: Shiqing Fan (fan_at_[hidden])
Date: 2009-08-12 13:24:35


Hi Allen,

Sorry for the confusion, your application doesn't use non-blocking
communications, so the receive buffers are still valid after you call
MPI_Recv_init, that's why the first two printf didn't complain. But in
MPI_Wait, it still checks the buffer, and make it invalid after packing
the message, that's because blocking and non-blocking communications
share some common code bases, and somehow memchecker can't distinguish
them. So for your case, I suggest you disable memchecker. And I'll find
a better solution for handling memchecker for both cases.

Thanks,
Shiqing

Allen Barnett wrote:
> Hi Shiqing:
> That is very clever to invalidate the buffer memory until the comm
> completes! However, I guess I'm still confused by my results. Lines 30
> and 31 identified by valgrind are the lines after the Wait, and, if I
> comment out the prints before the Wait, I still get the valgrind errors
> on the "After wait" prints.
>
> If I add prints after the Request_free calls, then I no longer receive
> the valgrind errors when accessing "buffer_in" from that point on. So,
> it appears that the buffer is marked invalid until the request is freed.
>
> Perhaps I don't understand the sequence of events in MPI. I thought the
> buffer was ok to use after the Wait, and requests could be safely
> recycled.
>
> Or maybe valgrind is pointing to the wrong lines, however the addresses
> which it reports as invalid are exactly those in the buffer which are
> being accessed in the post-Wait prints. Here is snippet of a more
> instrumented example program with line numbers.
> ----------------------------------------------
> 25 MPI_Recv_init( buffer_in, 100, MPI_CHAR, 1, 123, MPI_COMM_WORLD,
> &req_in );
> 26 printf( "Before start: %p: %d\n", &buffer_in[0], buffer_in[0] );
> 27 printf( "Before start: %p: %d\n", &buffer_in[1], buffer_in[1] );
> 28 MPI_Start( &req_in );
> 29 printf( "Before wait: %p: %d\n", &buffer_in[2], buffer_in[2] );
> 30 printf( "Before wait: %p: %d\n", &buffer_in[3], buffer_in[3] );
> 31 MPI_Wait( &req_in, &status );
> 32 printf( "After wait: %p: %d\n", &buffer_in[4], buffer_in[4] );
> 33 printf( "After wait: %p: %d\n", &buffer_in[5], buffer_in[5] );
> 34 MPI_Request_free( &req_in );
> 35 printf( "After free: %p: %d\n", &buffer_in[6], buffer_in[6] );
> 36 printf( "After free: %p: %d\n", &buffer_in[7], buffer_in[7] );
> --------------------------------------------------
> And the valgrind output
>
> Before start: 0x7ff0003c0: 1
> Before start: 0x7ff0003c1: 1
> Before wait: 0x7ff0003c2: 1
> Before wait: 0x7ff0003c3: 1
> ==17395==
> ==17395== Invalid read of size 1
> ==17395== at 0x400CB7: main (waittest.c:32)
> ==17395== Address 0x7ff0003c4 is on thread 1's stack
> After wait: 0x7ff0003c4: 2
> ==17395==
> ==17395== Invalid read of size 1
> ==17395== at 0x400CDB: main (waittest.c:33)
> ==17395== Address 0x7ff0003c5 is on thread 1's stack
> After wait: 0x7ff0003c5: 2
> After free: 0x7ff0003c6: 2
> After free: 0x7ff0003c7: 2
>
> Here valgrind is complaining about the prints on line 32 and 33 and the
> memory addresses are consistent with buffer_in[4] and buffer_in[5]. So,
> I'm still puzzled.
>
> Thanks,
> Allen
>
> On Wed, 2009-08-12 at 10:31 +0200, Shiqing Fan wrote:
>
>> Hi Allen,
>>
>> The invalid reads come from line 30 and 31 of your code, and I guess
>> they are the two 'printf's before MPI_Wait.
>>
>> In Open MPI, when memchecker is enabled, OMPI marks the receive buffer
>> as invalid internally, immediately after receive starts for MPI semantic
>> checks, in this case, it just warns the users that they are accessing
>> the receive buffer before the receive has finished, which is not allowed
>> according to the MPI standard.
>>
>> For a non-blocking receive, the communication only completes after
>> MPI_Wait is called. After that point, the user buffers are declared
>> valid again, that's why the 'printf's after MPI_Wait don't cause any
>> warnings from Valgrind. Hope this helps. :-)
>>
>>
>> Regards,
>> Shiqing
>>
>>
>> Allen Barnett wrote:
>>
>>> Hi:
>>> I'm trying to use the memchecker/valgrind capability of OpenMPI 1.3.3 to
>>> help debug my MPI application. I noticed a rather odd thing: After
>>> Waiting on a Recv Request, valgrind declares my receive buffer as
>>> invalid memory. Is this just a fluke of valgrind, or is OMPI doing
>>> something internally?
>>>
>>> This is on a 64-bit RHEL 5 system using GCC 4.3.2 and Valgrind 3.4.1.
>>>
>>> Here is an example:
>>> ----------------------------------------------------------
>>> #include <stdio.h>
>>> #include <string.h>
>>> #include "mpi.h"
>>>
>>> int main(int argc, char *argv[])
>>> {
>>> int rank, size;
>>>
>>> MPI_Init(&argc, &argv);
>>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>
>>> if ( size != 2 ) {
>>> if ( rank == 0 )
>>> printf("Please run with 2 processes.\n");
>>> MPI_Finalize();
>>> return 1;
>>> }
>>>
>>> if (rank == 0) {
>>> char buffer_in[100];
>>> MPI_Request req_in;
>>> MPI_Status status;
>>> memset( buffer_in, 1, sizeof(buffer_in) );
>>> MPI_Recv_init( buffer_in, 100, MPI_CHAR, 1, 123, MPI_COMM_WORLD,
>>> &req_in );
>>> MPI_Start( &req_in );
>>> printf( "Before wait: %p: %d\n", buffer_in, buffer_in[3] );
>>> printf( "Before wait: %p: %d\n", buffer_in, buffer_in[4] );
>>> MPI_Wait( &req_in, &status );
>>> printf( "After wait: %p: %d\n", buffer_in, buffer_in[3] );
>>> printf( "After wait: %p: %d\n", buffer_in, buffer_in[4] );
>>> MPI_Request_free( &req_in );
>>> }
>>> else {
>>> char buffer_out[100];
>>> memset( buffer_out, 2, sizeof(buffer_out) );
>>> MPI_Send( buffer_out, 100, MPI_CHAR, 0, 123, MPI_COMM_WORLD );
>>> }
>>>
>>> MPI_Finalize();
>>> return 0;
>>> }
>>> ----------------------------------------------------------
>>>
>>> Doing "mpirun -np 2 -mca btl ^sm valgrind ./a.out" yields:
>>>
>>> Before wait: 0x7ff0003b0: 1
>>> Before wait: 0x7ff0003b0: 1
>>> ==15487==
>>> ==15487== Invalid read of size 1
>>> ==15487== at 0x400C6B: main (waittest.c:30)
>>> ==15487== Address 0x7ff0003b3 is on thread 1's stack
>>> After wait: 0x7ff0003b0: 2
>>> ==15487==
>>> ==15487== Invalid read of size 1
>>> ==15487== at 0x400C8B: main (waittest.c:31)
>>> ==15487== Address 0x7ff0003b4 is on thread 1's stack
>>> After wait: 0x7ff0003b0: 2
>>>
>>> Also, if I run this program with the shared memory BTL active, valgrind
>>> reports several "conditional jump or move depends on uninitialized
>>> value"s in the SM BTL and about 24k lost bytes at the end (mostly from
>>> allocations in MPI_Init).
>>>
>>> Thanks,
>>> Allen
>>>
>>>
>>>
>>

-- 
--------------------------------------------------------------
Shiqing Fan                          http://www.hlrs.de/people/fan
High Performance Computing           Tel.: +49 711 685 87234
  Center Stuttgart (HLRS)            Fax.: +49 711 685 65832
Address:Allmandring 30               email: fan_at_[hidden]    
70569 Stuttgart