Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] RFC: Changing 32-bit build behavior/sizes for MPI_Count and MPI_Offset
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-02-18 14:40:52


Just a reminder -- this RFC timed out today.

If there are no objections to this, I'll commit the patch on #4205 to the trunk tomorrow evening.

No one has come up with a patch yet for the v1.7 branch (because of ABI reasons, it must be different than what we do on the trunk), but since that is definitely a bug fix, it can go in at any time.

On Feb 10, 2014, at 7:14 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:

> WHAT: On trunk, force MPI_Count/MPI_Offset to be 32 bits when building in 32 bit mode (they are currently 64 bit, even in a 32 bit build). On v1.7, leave the sizes at 64 bit (for ABI reasons), but put error checking in the MPI API layer to ensure we won't over/underflow 32 bits.
>
> WHY: See ticket #4205 (https://svn.open-mpi.org/trac/ompi/ticket/4205)
>
> WHERE: On trunk, this can be solved entirely in configury. In v1.7/v1.8, make changes in the MPI API layer (e.g., check MPI_Send to ensure (count*size_of_datatype)<2B)
>
> TIMEOUT: I'll tentatively say next Tuesday teleconf, Feb 18, 2014, but it can be pushed back -- there's no real rush; this isn't a hot issue (but it is wrong and should be fixed).
>
> MORE DETAIL:
>
> I noticed that MPI_Get_elements_x() and MPI_Type_size_x() were giving wrong answers when compiled in 32 bit mode on a 64 bit machine. This is because in that build:
>
> - size_t: 4 bytes
> - ptrdiff_t: 4 bytes
> - MPI_Aint: 4 bytes
> - MPI_Offset: 8 bytes
> - MPI_Count: 8 bytes
>
> Some data points:
>
> 1. MPI-3 says that MPI_Count must be big enough to hold both an MPI_Aint and MPI_Offset.
>
> 2. The entire PML/BML/BTL/convertor infrastructure uses size_t as its underlying computation type.
>
> 3. The _x tests were failing in 32 bit builds because they take (count,datatype) input that intentionally results in a number of bytes that is larger than 2 billion, assigned that value to a size_t (which is 32 bits), caused an overflow, and therefore got the wrong answer.
>
> To solve this:
>
> - On the trunk, we can just not allow MPI_Count (and therefore MPI_Offset) to be larger than size_t. This means that on 32 bit builds -- on both 32 and 64 bit systems -- sizeof(MPI_Aint) == sizeof(MPI_Offset) == sizeof(MPI_Count) == 4. There is a patch for this on #4205.
>
> - Because of ABI issues, we cannot change the size of MPI_Count/MPI_Offset on v1.7, so we can just check for over/underflow in the MPI API. For example, we can check that (count * size_of_datatype) < 2 billion (other checks will also be necessary; this is just an example). I have no patch for this yet.
>
> As a side effect, this means that -- for 32 bit builds -- we will not support large filesystems well (e.g., filesystems with 64 bit offsets). BlueGene is an example of such a system (not that OMPI supports BlueGene, but...). Specifically: for 32 bit builds, we'll only allow MPI_Offset to be 32 bits. I don't think that this is a major issue, because 32 bit builds are not a huge issue for the OMPI community, but I raise the point in the spirit of full disclosure. Fixing it to allow 32 bit MPI_Aint but 64 bit MPI_Offset and MPI_Count would likely mean re-tooling the PML/BML/BTL/convertor infrastructure to use something other than size_t, and I have zero desire to do that! (please, no OMPI vendor reveal that they're going to seriously build giant 32 bit systems...)
>
> Also, while investigating this issue, I discovered that the configury for determining the Fortran MPI_ADDRESS_KIND, MPI_OFFSET_KIND, and MPI_COUNT_KIND values were unrelated to the C types that we discovered for these concepts. The patch on #4205 fixes this issue as well -- the Fortran MPI_*_KIND value are now directly correlated with the C types that were discovered.
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/