Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer
From: George Bosilca (bosilca_at_[hidden])
Date: 2012-03-05 17:25:54


I gave it a try (r26103). It was messy, and I hope I got it right. Let's soak it for few days with our nightly testing to see how it behave.

  george.

On Mar 5, 2012, at 16:37 , N.M. Maclaren wrote:

> On Mar 5 2012, George Bosilca wrote:
>>
>> I was afraid about all those little intermediary steps. I asked a compiler guy and apparently reversing the order (aka starting with the ptrdiff_t variable) will not solve anything. The only portable way to solve this is to cast every single member, to prevent __any__ compiler from hurting us.
>
> That is true, but even that may not help, given that each version of
> the C standard has been incompatible with its predecessors. And see
> below.
>
>>> In my copy of C99, section 6.5 Expressions says " the order of evaluation of subexpressions and the order in which side effects take place are both unspecified. There is a footnote 71 that "specifies the precedence of operators in the evaluation of an expressions, which is the same as the order of the major subclauses of this subclause, highest precedence first." It is the footnote that implies multiplication (6.5.5 Multiplicative operators) has higher precedence than addition (6.5.6 Additive operators) in the expression "(char*) rbuf + rank * rcount * rext". But, the main text states that there is no ordering of the subexpression "rank * rcount * rext". When the compiler chooses to evaluate "rank * rcount" first, the overflow described by Yuki can result. I think you are correct that the subexpression will get promoted to (ptrdiff_t), but that is not quite the same thing.
>
> No, it's not as simple as that :-(
>
> That was the intent during the standardisation of C90, but those of
> us who tried failed to get any explicit statement into it, and the
> situation during C99 was that "but everybody knows that" the syntax
> rules also define the evaluation order. We failed to get that stated
> then, either :-( That interpretation was apparently also the one
> assumed by C++03, too, and now is explicitly (if informally) stated in
> C++11. So you theoretically can just cast the first operand to the
> maximum precision and it will all work.
>
> What it means by the "order of evaluation of subexpressions" is that
> the assignments in '(a = b) + (c = d) + (e = f)' can take place in
> any order, which is a different issue.
> HOWEVER, about half of the C communities have given C99 the thumbs
> down, I doubt that C11 will be taken much notice of, gcc is the
> de facto standard definer, and most compilers have optimisation
> options that say "ignore the standard when it helps to go faster".
> So the only feasible rule is to do your damnedest to defend yourself
> against the aberrations, ambiguities and inconsistencies of C, and
> hope for the best. I.e. what George recommends.
>
> But will even that work reliably in the medium term? I wouldn't
> bet on it :-(
>
>
> Regards,
> Nick Maclaren.
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel