Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2007-04-18 18:14:45


I am the developer and the maintainer of the data-type engine in Open
MPI. And, I'm stunned (!) It never occur to me that someone will ever
use a data-type description that need more than 32K entries on the
internal stack.

Let me explain a little bit. The stack is used to efficiently parse
the data-type description. The 32K limit it's not a limit for the
number of predefined MPI types in the data-type, but a limit for the
number of different data descriptions (a description is like a vector
of a predefined type). As an example an MPI_Type_struct with count 10
will use 11 entries. So in order to overload this data description
one has to use an MPI_Type_struct with a count bigger than 32K (which
might be the case with the BOOST library you're using in your code).

In conclusion if your data-type description contain more than 32K
entries, the current implementation will definitively not work for
you. How many entries are in your data-type description ? There is an
easy way to figure out if this is the problem with your code.
Attaching gdb to your process and setting a break in the
ompi_generic_simple_pack function is the first step. Once there,
doing in gdb "call ompi_ddt_dump(pData)" will print a high level
description of the data as represented internally in Open MPI. If you
can provide the output of this call I can tell you in few seconds if
this is the real issue or not.

However, this raise another question about the performance you expect
from your code. A data description with more than 32K items, cannot
be efficiently optimized by any automatic data-type engine. Moreover,
it cannot be easily parsed. I suggest that if it's possible to
identify access patterns that are repetitive, one should use them in
order to improve the data-type description.

   Thanks,
     george.

On Apr 18, 2007, at 4:16 PM, Michael Gauckler wrote:

> Dear Open-MPI Developers,
>
> investigations on the segmentation fault (see previous postings
> "Signal: Segmentation fault (11) Problem") lets us suspect that
> Open-MPI allows only a limited number of elements in the
> description of user-defined MPI_Datatypes.
>
> Our application segmentation-faults when a large user-defined data
> structure is passed to MPI_Send.
>
> The segmentation fault happens in the function
> ompi_generic_simple_pack in datatype_pack.c when trying to access
> pElem (Bad address). The structure pElem is set in line 276, where
> it is retrieved as
>
> 276: pElem = &(description[pos_desc]);
>
> pos_desc is of type uint32_t with the value 0xffff929f
> (4294939295), which itself is set on line 271 by a variable of type
> int16_t and value -1. This leads to the indexing of the description
> structure at position -1, producing the segmentation fault. The
> origin of the pos_desc can be faund in the same function at line 271:
>
> 271: pos_desc = pStack->index;
>
> The structure to which pStack is pointing is of type dt_stack,
> defined in ompi/datatype/convertor.h starting at line 65, where
> index is and int16_t and commented with “index in the element
> description”:
>
>
> typedef struct dt_stack {
>
> int16_t index; /**< index in the element description */
>
> int16_t type; /**< the type used for the last pack/unpack
> (original or DT_BYTE) */
>
> size_t count; /**< number of times we still have to do it */
>
> ptrdiff_t disp; /**< actual displacement depending on the
> count field */
>
> } dt_stack_t;
>
>
> We therefore conclude that MPI_Datatypes, which are constructed
> with Open-MPI (in the release of 1.2.1a of April 10th 2007)
> have the limitation of containing a maximum of 32’768 separate
> entries.
>
> Although changing the type of the index to int32_t solves the
> problem of the segmentation fault, I would be happy if the author /
> maintainer of the code could have a look at it and decide if this
> is viable fix. Having spent a lot of time in hunting down the issue
> into the Open-MPI code, I would be glad to see the issue fixed in
> upcoming releases.
>
> Thanx and regards,
> Michael Gauckler
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users



  • application/pkcs7-signature attachment: smime.p7s