Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Memchecker and Wait
From: Allen Barnett (allen_at_[hidden])
Date: 2009-08-12 09:12:07


Hi Shiqing:
That is very clever to invalidate the buffer memory until the comm
completes! However, I guess I'm still confused by my results. Lines 30
and 31 identified by valgrind are the lines after the Wait, and, if I
comment out the prints before the Wait, I still get the valgrind errors
on the "After wait" prints.

If I add prints after the Request_free calls, then I no longer receive
the valgrind errors when accessing "buffer_in" from that point on. So,
it appears that the buffer is marked invalid until the request is freed.

Perhaps I don't understand the sequence of events in MPI. I thought the
buffer was ok to use after the Wait, and requests could be safely
recycled.

Or maybe valgrind is pointing to the wrong lines, however the addresses
which it reports as invalid are exactly those in the buffer which are
being accessed in the post-Wait prints. Here is snippet of a more
instrumented example program with line numbers.
----------------------------------------------
25 MPI_Recv_init( buffer_in, 100, MPI_CHAR, 1, 123, MPI_COMM_WORLD,
&req_in );
26 printf( "Before start: %p: %d\n", &buffer_in[0], buffer_in[0] );
27 printf( "Before start: %p: %d\n", &buffer_in[1], buffer_in[1] );
28 MPI_Start( &req_in );
29 printf( "Before wait: %p: %d\n", &buffer_in[2], buffer_in[2] );
30 printf( "Before wait: %p: %d\n", &buffer_in[3], buffer_in[3] );
31 MPI_Wait( &req_in, &status );
32 printf( "After wait: %p: %d\n", &buffer_in[4], buffer_in[4] );
33 printf( "After wait: %p: %d\n", &buffer_in[5], buffer_in[5] );
34 MPI_Request_free( &req_in );
35 printf( "After free: %p: %d\n", &buffer_in[6], buffer_in[6] );
36 printf( "After free: %p: %d\n", &buffer_in[7], buffer_in[7] );
--------------------------------------------------
And the valgrind output

Before start: 0x7ff0003c0: 1
Before start: 0x7ff0003c1: 1
Before wait: 0x7ff0003c2: 1
Before wait: 0x7ff0003c3: 1
==17395==
==17395== Invalid read of size 1
==17395== at 0x400CB7: main (waittest.c:32)
==17395== Address 0x7ff0003c4 is on thread 1's stack
After wait: 0x7ff0003c4: 2
==17395==
==17395== Invalid read of size 1
==17395== at 0x400CDB: main (waittest.c:33)
==17395== Address 0x7ff0003c5 is on thread 1's stack
After wait: 0x7ff0003c5: 2
After free: 0x7ff0003c6: 2
After free: 0x7ff0003c7: 2

Here valgrind is complaining about the prints on line 32 and 33 and the
memory addresses are consistent with buffer_in[4] and buffer_in[5]. So,
I'm still puzzled.

Thanks,
Allen

On Wed, 2009-08-12 at 10:31 +0200, Shiqing Fan wrote:
> Hi Allen,
>
> The invalid reads come from line 30 and 31 of your code, and I guess
> they are the two 'printf's before MPI_Wait.
>
> In Open MPI, when memchecker is enabled, OMPI marks the receive buffer
> as invalid internally, immediately after receive starts for MPI semantic
> checks, in this case, it just warns the users that they are accessing
> the receive buffer before the receive has finished, which is not allowed
> according to the MPI standard.
>
> For a non-blocking receive, the communication only completes after
> MPI_Wait is called. After that point, the user buffers are declared
> valid again, that's why the 'printf's after MPI_Wait don't cause any
> warnings from Valgrind. Hope this helps. :-)
>
>
> Regards,
> Shiqing
>
>
> Allen Barnett wrote:
> > Hi:
> > I'm trying to use the memchecker/valgrind capability of OpenMPI 1.3.3 to
> > help debug my MPI application. I noticed a rather odd thing: After
> > Waiting on a Recv Request, valgrind declares my receive buffer as
> > invalid memory. Is this just a fluke of valgrind, or is OMPI doing
> > something internally?
> >
> > This is on a 64-bit RHEL 5 system using GCC 4.3.2 and Valgrind 3.4.1.
> >
> > Here is an example:
> > ----------------------------------------------------------
> > #include <stdio.h>
> > #include <string.h>
> > #include "mpi.h"
> >
> > int main(int argc, char *argv[])
> > {
> > int rank, size;
> >
> > MPI_Init(&argc, &argv);
> > MPI_Comm_size(MPI_COMM_WORLD, &size);
> > MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> >
> > if ( size != 2 ) {
> > if ( rank == 0 )
> > printf("Please run with 2 processes.\n");
> > MPI_Finalize();
> > return 1;
> > }
> >
> > if (rank == 0) {
> > char buffer_in[100];
> > MPI_Request req_in;
> > MPI_Status status;
> > memset( buffer_in, 1, sizeof(buffer_in) );
> > MPI_Recv_init( buffer_in, 100, MPI_CHAR, 1, 123, MPI_COMM_WORLD,
> > &req_in );
> > MPI_Start( &req_in );
> > printf( "Before wait: %p: %d\n", buffer_in, buffer_in[3] );
> > printf( "Before wait: %p: %d\n", buffer_in, buffer_in[4] );
> > MPI_Wait( &req_in, &status );
> > printf( "After wait: %p: %d\n", buffer_in, buffer_in[3] );
> > printf( "After wait: %p: %d\n", buffer_in, buffer_in[4] );
> > MPI_Request_free( &req_in );
> > }
> > else {
> > char buffer_out[100];
> > memset( buffer_out, 2, sizeof(buffer_out) );
> > MPI_Send( buffer_out, 100, MPI_CHAR, 0, 123, MPI_COMM_WORLD );
> > }
> >
> > MPI_Finalize();
> > return 0;
> > }
> > ----------------------------------------------------------
> >
> > Doing "mpirun -np 2 -mca btl ^sm valgrind ./a.out" yields:
> >
> > Before wait: 0x7ff0003b0: 1
> > Before wait: 0x7ff0003b0: 1
> > ==15487==
> > ==15487== Invalid read of size 1
> > ==15487== at 0x400C6B: main (waittest.c:30)
> > ==15487== Address 0x7ff0003b3 is on thread 1's stack
> > After wait: 0x7ff0003b0: 2
> > ==15487==
> > ==15487== Invalid read of size 1
> > ==15487== at 0x400C8B: main (waittest.c:31)
> > ==15487== Address 0x7ff0003b4 is on thread 1's stack
> > After wait: 0x7ff0003b0: 2
> >
> > Also, if I run this program with the shared memory BTL active, valgrind
> > reports several "conditional jump or move depends on uninitialized
> > value"s in the SM BTL and about 24k lost bytes at the end (mostly from
> > allocations in MPI_Init).
> >
> > Thanks,
> > Allen
> >
> >
>
>

-- 
Allen Barnett
Transpire, Inc
E-Mail: allen_at_[hidden]
Skype:  allenbarnett
Ph:     518-887-2930