Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Changing 32-bit build behavior/sizes for MPI_Count and MPI_Offset
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-02-18 14:40:52


Just a reminder -- this RFC timed out today.

If there are no objections to this, I'll commit the patch on #4205 to the trunk tomorrow evening.

No one has come up with a patch yet for the v1.7 branch (because of ABI reasons, it must be different than what we do on the trunk), but since that is definitely a bug fix, it can go in at any time.

On Feb 10, 2014, at 7:14 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:

> WHAT: On trunk, force MPI_Count/MPI_Offset to be 32 bits when building in 32 bit mode (they are currently 64 bit, even in a 32 bit build). On v1.7, leave the sizes at 64 bit (for ABI reasons), but put error checking in the MPI API layer to ensure we won't over/underflow 32 bits.
>
> WHY: See ticket #4205 (https://svn.open-mpi.org/trac/ompi/ticket/4205)
>
> WHERE: On trunk, this can be solved entirely in configury. In v1.7/v1.8, make changes in the MPI API layer (e.g., check MPI_Send to ensure (count*size_of_datatype)<2B)
>
> TIMEOUT: I'll tentatively say next Tuesday teleconf, Feb 18, 2014, but it can be pushed back -- there's no real rush; this isn't a hot issue (but it is wrong and should be fixed).
>
> MORE DETAIL:
>
> I noticed that MPI_Get_elements_x() and MPI_Type_size_x() were giving wrong answers when compiled in 32 bit mode on a 64 bit machine. This is because in that build:
>
> - size_t: 4 bytes
> - ptrdiff_t: 4 bytes
> - MPI_Aint: 4 bytes
> - MPI_Offset: 8 bytes
> - MPI_Count: 8 bytes
>
> Some data points:
>
> 1. MPI-3 says that MPI_Count must be big enough to hold both an MPI_Aint and MPI_Offset.
>
> 2. The entire PML/BML/BTL/convertor infrastructure uses size_t as its underlying computation type.
>
> 3. The _x tests were failing in 32 bit builds because they take (count,datatype) input that intentionally results in a number of bytes that is larger than 2 billion, assigned that value to a size_t (which is 32 bits), caused an overflow, and therefore got the wrong answer.
>
> To solve this:
>
> - On the trunk, we can just not allow MPI_Count (and therefore MPI_Offset) to be larger than size_t. This means that on 32 bit builds -- on both 32 and 64 bit systems -- sizeof(MPI_Aint) == sizeof(MPI_Offset) == sizeof(MPI_Count) == 4. There is a patch for this on #4205.
>
> - Because of ABI issues, we cannot change the size of MPI_Count/MPI_Offset on v1.7, so we can just check for over/underflow in the MPI API. For example, we can check that (count * size_of_datatype) < 2 billion (other checks will also be necessary; this is just an example). I have no patch for this yet.
>
> As a side effect, this means that -- for 32 bit builds -- we will not support large filesystems well (e.g., filesystems with 64 bit offsets). BlueGene is an example of such a system (not that OMPI supports BlueGene, but...). Specifically: for 32 bit builds, we'll only allow MPI_Offset to be 32 bits. I don't think that this is a major issue, because 32 bit builds are not a huge issue for the OMPI community, but I raise the point in the spirit of full disclosure. Fixing it to allow 32 bit MPI_Aint but 64 bit MPI_Offset and MPI_Count would likely mean re-tooling the PML/BML/BTL/convertor infrastructure to use something other than size_t, and I have zero desire to do that! (please, no OMPI vendor reveal that they're going to seriously build giant 32 bit systems...)
>
> Also, while investigating this issue, I discovered that the configury for determining the Fortran MPI_ADDRESS_KIND, MPI_OFFSET_KIND, and MPI_COUNT_KIND values were unrelated to the C types that we discovered for these concepts. The patch on #4205 fixes this issue as well -- the Fortran MPI_*_KIND value are now directly correlated with the C types that were discovered.
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/