Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer
From: Tomoya Adachi (adachi.tomoya_at_[hidden])
Date: 2012-03-16 03:28:38


Hi George,

I'm a member of Fujitsu MPI development team.
Thank you for picking up the issue.

We checked the changesets and unfortunately found they are incomplete.

Our testing method is as follows:
- Using LLVM clang to compile trunk with -ftrapv (integer overflow detection)
  because GCC's -ftrapv is broken :-(
- Checking algorithms with 600MB MPI_BYTE messages on an 8-node cluster.
  'v' functions (which take int *displs) are checked with 300MB message per process,
  i.e. count[] = {300M, 300M, ..., 300M} and dislpls[] = {0, 300M, ..., 2100M}

Then, we detected five issues.

- ompi_datatype_copy_content_same_ddt does not work correctly (partially fixed in r26097)
  * the second argument of opal_datatype_copy_content_same_ddt() should be 'length'
  * pDestBuf and pSrcBuf should be advanced in the loop

- Reduce_scatter algorithms cause overflow in not multiplication but addition
  like "total_count += rcounts[i];"

- 'binomial' algorithms for Gather and Scatter still have integer overflow
  ("mycount *= rcount;" and "total_recv += mycount;")

  But some collectives still do not work for the following reasons:

- PML abend when count>= 2^31 because convertor functions use (u)int32_t count
  (binomial and recursive halving collective algorithms are affected)

- ompi_datatype_create_indexed also have a problem when sum of pBlockLength[] >= 2^31
  (used in some Allgatherv algorithms)

Changing datatype (convertor) interfaces and internals to use (s)size_t
might be hard work. (but should be done in the future?)
Do you have any good idea?

Regards,
Tomoya Adachi
MPI development team, Fujitsu

(2012/03/06 7:25), George Bosilca wrote:
> I gave it a try (r26103). It was messy, and I hope I got it right. Let's soak it for few days with our nightly testing to see how it behave.
>
> george.
>
> On Mar 5, 2012, at 16:37 , N.M. Maclaren wrote:
>
>> On Mar 5 2012, George Bosilca wrote:
>>>
>>> I was afraid about all those little intermediary steps. I asked a compiler guy and apparently reversing the order (aka starting with the ptrdiff_t variable) will not solve anything. The only portable way to solve this is to cast every single member, to prevent __any__ compiler from hurting us.
>>
>> That is true, but even that may not help, given that each version of
>> the C standard has been incompatible with its predecessors. And see
>> below.
>>
>>>> In my copy of C99, section 6.5 Expressions says " the order of evaluation of subexpressions and the order in which side effects take place are both unspecified. There is a footnote 71 that "specifies the precedence of operators in the evaluation of an expressions, which is the same as the order of the major subclauses of this subclause, highest precedence first." It is the footnote that implies multiplication (6.5.5 Multiplicative operators) has higher precedence than addition (6.5.6 Additive operators) in the expression "(char*) rbuf + rank * rcount * rext". But, the main text states that there is no ordering of the subexpression "rank * rcount * rext". When the compiler chooses to evaluate "rank * rcount" first, the overflow described by Yuki can result. I think you are correct that the subexpression will get promoted to (ptrdiff_t), but that is not quite the same thing.
>>
>> No, it's not as simple as that :-(
>>
>> That was the intent during the standardisation of C90, but those of
>> us who tried failed to get any explicit statement into it, and the
>> situation during C99 was that "but everybody knows that" the syntax
>> rules also define the evaluation order. We failed to get that stated
>> then, either :-( That interpretation was apparently also the one
>> assumed by C++03, too, and now is explicitly (if informally) stated in
>> C++11. So you theoretically can just cast the first operand to the
>> maximum precision and it will all work.
>>
>> What it means by the "order of evaluation of subexpressions" is that
>> the assignments in '(a = b) + (c = d) + (e = f)' can take place in
>> any order, which is a different issue.
>> HOWEVER, about half of the C communities have given C99 the thumbs
>> down, I doubt that C11 will be taken much notice of, gcc is the
>> de facto standard definer, and most compilers have optimisation
>> options that say "ignore the standard when it helps to go faster".
>> So the only feasible rule is to do your damnedest to defend yourself
>> against the aberrations, ambiguities and inconsistencies of C, and
>> hope for the best. I.e. what George recommends.
>>
>> But will even that work reliably in the medium term? I wouldn't
>> bet on it :-(
>>
>>
>> Regards,
>> Nick Maclaren.
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>