Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Michael Gauckler (mailing lists) (maillists_at_[hidden])
Date: 2007-04-19 04:34:57

Hi George,

Thank you for the prompt reply. Indeed we are constructing a data-type
description with more than 32k entries.

I attached a screenshot of the pData structure (displayed with the TotalView
debugger), I hope this helps you. Unfortunately I was not able to use gdb to
execute the call you mentioned.

Let me explain the relation of our code with the BOOST libraries: The code
I'm debugging at the moment does not use any BOOST library to interface MPI,
but it uses the same ideas of how to automatically create the data-types as
the BOOST Parallel/Message Passing/MPI [1] library. This is due to the fact
that the library is based on our ideas and the goal to factor out our
message passing code into an open-source library (see [2]).

Even though such an automatically created data-type description might not
lead to an optimal performance, I think large descriptors should be
supported for several reasons:

- even when using MPI, not all parts of the code are performance critical
- other MPI implementations support it
- passing large/complicated data structures with the BOOST Parallel/Message
Passing/MPI library
  (which supports LAM, MPICH and Open-MPI out of the box, see [3]) will
probably lead to the same
- the fix has minor to no impact on rest of the code base, at least an
appropriate error handling
  would be expected in the case of a too large data type descriptor.

I hope that we are now sure that we have identified the problem as well as
the solution and that you are willing to fix the issue in upcoming releases
of Open-MPI. If there is anything else I can help with, please let me know.

 Michael Gauckler




-----Original Message-----
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
Behalf Of George Bosilca
Sent: Thursday, April 19, 2007 12:15 AM
To: Open MPI Users
Subject: Re: [OMPI users] Datatype construction, serious limitation (was:
Signal: Segmentation fault (11) Problem)

I am the developer and the maintainer of the data-type engine in Open MPI.
And, I'm stunned (!) It never occur to me that someone will ever use a
data-type description that need more than 32K entries on the internal stack.

Let me explain a little bit. The stack is used to efficiently parse the
data-type description. The 32K limit it's not a limit for the number of
predefined MPI types in the data-type, but a limit for the number of
different data descriptions (a description is like a vector of a predefined
type). As an example an MPI_Type_struct with count 10 will use 11 entries.
So in order to overload this data description one has to use an
MPI_Type_struct with a count bigger than 32K (which might be the case with
the BOOST library you're using in your code).

In conclusion if your data-type description contain more than 32K entries,
the current implementation will definitively not work for you. How many
entries are in your data-type description ? There is an easy way to figure
out if this is the problem with your code.
Attaching gdb to your process and setting a break in the
ompi_generic_simple_pack function is the first step. Once there, doing in
gdb "call ompi_ddt_dump(pData)" will print a high level description of the
data as represented internally in Open MPI. If you can provide the output of
this call I can tell you in few seconds if this is the real issue or not.

However, this raise another question about the performance you expect from
your code. A data description with more than 32K items, cannot be
efficiently optimized by any automatic data-type engine. Moreover, it cannot
be easily parsed. I suggest that if it's possible to identify access
patterns that are repetitive, one should use them in order to improve the
data-type description.


On Apr 18, 2007, at 4:16 PM, Michael Gauckler wrote:

> Dear Open-MPI Developers,
> investigations on the segmentation fault (see previous postings
> "Signal: Segmentation fault (11) Problem") lets us suspect that
> Open-MPI allows only a limited number of elements in the description
> of user-defined MPI_Datatypes.
> Our application segmentation-faults when a large user-defined data
> structure is passed to MPI_Send.
> The segmentation fault happens in the function
> ompi_generic_simple_pack in datatype_pack.c when trying to access
> pElem (Bad address). The structure pElem is set in line 276, where it
> is retrieved as
> 276: pElem = &(description[pos_desc]);
> pos_desc is of type uint32_t with the value 0xffff929f (4294939295),
> which itself is set on line 271 by a variable of type int16_t and
> value -1. This leads to the indexing of the description structure at
> position -1, producing the segmentation fault. The origin of the
> pos_desc can be faund in the same function at line 271:
> 271: pos_desc = pStack->index;
> The structure to which pStack is pointing is of type dt_stack, defined
> in ompi/datatype/convertor.h starting at line 65, where index is and
> int16_t and commented with "index in the element
> description":
> typedef struct dt_stack {
> int16_t index; /**< index in the element description */
> int16_t type; /**< the type used for the last pack/unpack
> (original or DT_BYTE) */
> size_t count; /**< number of times we still have to do it */
> ptrdiff_t disp; /**< actual displacement depending on the
> count field */
> } dt_stack_t;
> We therefore conclude that MPI_Datatypes, which are constructed with
> Open-MPI (in the release of 1.2.1a of April 10th 2007) have the
> limitation of containing a maximum of 32'768 separate entries.
> Although changing the type of the index to int32_t solves the problem
> of the segmentation fault, I would be happy if the author / maintainer
> of the code could have a look at it and decide if this is viable fix.
> Having spent a lot of time in hunting down the issue into the Open-MPI
> code, I would be glad to see the issue fixed in upcoming releases.
> Thanx and regards,
> Michael Gauckler
> _______________________________________________
> users mailing list
> users_at_[hidden]