Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer
From: Larry Baker (baker_at_[hidden])
Date: 2012-03-05 13:40:10


George,

I think Yuki's interpretation is correct.

>> The following is one of the suspicious parts.
>> (Many similar code in ompi/coll/tuned/*.c)
>>
>> --- in ompi/coll/tuned/coll_tuned_allgather.c (V1.4.X's trunk)---
>> 398 tmprecv = (char*) rbuf + rank * rcount * rext;
>> -----------------------------------------------------------------
>>
>> if this condition is met, "rank * rcount" is overflowed.
>> So, we fixed it tentatively like following:
>> (cast int to size_t)
>> --- in ompi/coll/tuned/coll_tuned_allgather.c --------------
>> 398 tmprecv = (char*) rbuf + (size_t)rank * rcount * rext;
>> ------------------------------------------------------------
>
> Based on my understanding of the C standard this operation should be
> done on the most extended type, in this particular case the one of
> the rext (ptrdiff_t). Thus I would say the displacement should be
> correctly computed.

In my copy of C99, section 6.5 Expressions says " the order of
evaluation of subexpressions and the order in which side effects take
place are both unspecified. There is a footnote 71 that "specifies
the precedence of operators in the evaluation of an expressions, which
is the same as the order of the major subclauses of this subclause,
highest precedence first." It is the footnote that implies
multiplication (6.5.5 Multiplicative operators) has higher precedence
than addition (6.5.6 Additive operators) in the expression "(char*)
rbuf + rank * rcount * rext". But, the main text states that there is
no ordering of the subexpression "rank * rcount * rext". When the
compiler chooses to evaluate "rank * rcount" first, the overflow
described by Yuki can result. I think you are correct that the
subexpression will get promoted to (ptrdiff_t), but that is not quite
the same thing.

Larry Baker
US Geological Survey
650-329-5608
baker_at_[hidden]