Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Error message when using MPI_Type_struct()
From: Aurélien Bouteiller (bouteill_at_[hidden])
Date: 2009-01-09 10:53:42


Hi Thomas,

The message you get comes from the convertor. The convertor is in
charge of packing/unpacking the data. As you add yourself an extra
int to the wire data, the convertor gets confused on the receiver
side, as it gets a message that's not in the expected format.

What you should do in my opinion is to create a new convertor (there
is an mca framework for this) that allocates this extra int for you.
Then, because you will use the same convertor at both ends, you will
be able to unpack correctly what you sent. As a free benefit, you will
be able to use the mpool instead of malloc, which should moderate the
overhead of creating the intermediate buffer.

Aurelien

Le 8 janv. 09 à 05:13, Thomas Ropars a écrit :

> Hi,
>
> I submit again this old question because I didn't get any answer
> last time.
>
> My problem is the following one:
> I try to implement piggyback mechanisms. In fact I want to piggyback
> an integer on every message.
> To do that, I dynamically create new datatype for each send.
> The code I use to do that is described below. This code works fine.
> But if the integer I piggyback (named "pigg" in the code) is
> allocated using malloc,I still get the good result, but I get the
> following kind of message:
>
> ../../ompi/datatype/datatype_pack.h:38
> Pointer 0xbff25fbc size 4 is outside [0xbff25fbc,0x911300c] for
> base ptr (nil) count 1 and data
> Datatype 0x9183be8[] size 8 align 4 id 0 length 3 used 2
> true_lb -1074634820 true_ub 152121356 (true_extent 1226756176) lb
> -1074634820 ub 152121356 (extent 1226756176)
> nbElems 2 loops 0 flags 102 (commited )-c-----GD--[---][---]
> contain MPI_INT
> --C---P-D--[ C ][INT] MPI_INT count 1 disp 0xbff25fbc
> (-1074634820) extent 4 (size 4)
> --C---P-D--[ C ][INT] MPI_INT count 1 disp 0x9113008
> (152121352) extent 4 (size 4)
> -------G---[---][---] MPI_END_LOOP prev 2 elements first elem
> displacement -1074634820 size of data 8
> Optimized description
> -cC---P-DB-[ C ][ERR] MPI_INT count 1 disp 0xbff25fbc
> (-1074634820) extent 4 (size 4)
> -cC---P-DB-[ C ][ERR] MPI_INT count 1 disp 0x9113008
> (152121352) extent 4 (size 4)
> -------G---[---][---] MPI_END_LOOP prev 2 elements first elem
> displacement -1074634820 size of data 8
>
> My question is : what does this message means ? Is there an error in
> my code ? and what can I do to avoid this message ?
>
> Regards,
>
> Thomas
>
> Thomas Ropars wrote:
>> Hi,
>>
>> I'm currently implementing a mechanism to piggyback information on
>> messages. On message sending, I dynamically create a new datatype
>> composed of the original buffer and of the data to piggyback.
>>
>> For instance, if I want to piggyback an integer on each message, I
>> use the following code:
>>
>> int send(void *buf,
>> size_t count,
>> struct ompi_datatype_t* datatype,
>> int dst,
>> int tag,
>> mca_pml_base_send_mode_t sendmode,
>> ompi_communicator_t* comm )
>> {
>> MPI_Datatype type[2];
>> int blocklen[2];
>> MPI_Aint disp[2];
>> MPI_Datatype datatype_out;
>> int piggy=0;
>>
>> type[0]=datatype;
>> type[1]=MPI_INT;
>> blocklen[0]=count;
>> blocklen[1]=1;
>>
>> MPI_Address(buf,disp);
>> MPI_Address(&piggy,disp+1);
>>
>> MPI_Type_struct(2, blocklen, disp, type, datatype_out);
>>
>> MPI_Type_commit(datatype_out);
>>
>> /* then I call the original send function and send my new
>> datatype */
>> original_send(MPI_BOTTOM, 1, datatype_out, dst, tag, sendmode,
>> comm);
>>
>> }
>>
>> This code works well. But if the data I want to piggyback is
>> dynamically allocated. I get this kind of error message:
>>
>> ../../ompi/datatype/datatype_pack.h:40
>> Pointer 0x823fab0 size 4 is outside [0xbfef8920,0x823fab4] for
>> base ptr (nil) count 1 and data
>> Datatype 0x8240b90[] size 8 align 4 id 0 length 3 used 2
>> true_lb -1074820832 true_ub 136575668 (true_extent 1211396500) lb
>> -1074820832 ub 136575668 (extent 1211396500)
>> nbElems 2 loops 0 flags 102 (commited )-c-----GD--[---][---
>>
>> Despite this message, the function works well too ...
>>
>> Can someone explain me what this message means ? It seems that in
>> the first part of the error message, the lower bound and the upper
>> bound of the datatype are switched, but I don't know why.
>>
>>
>> Regards.
>>
>> Thomas Ropars
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users