I've talked with both Brian and Rich about the measurements and they are
ok with the new findings. I also have not received any other comments
to the negative on putting 1097 into the v1.2 branch. So I would like
to instruct Tim Mattox to bring over the 1097 change to v1.2 branch and
make a new 1.2 RC.
Terry Dontje wrote:
> Nikolay and Community,
> Sorry to be so late in responding to your email but I've been working
> with Pak to determine whether my hasty decision as RM yesterday was
> hasty or not. To answer your question, we are still trying to determine
> if the message queue support can go in or not and the below is my
> perspective on whether we should.
> A couple things have transpired in the last 24 hours from when we had
> our concall. As Jeff surmised earlier this morning Pak did accidentally
> have debugging enabled which did skew the numbers quite a bit. After
> making sure debugging was disabled for both builds (v1.2 and the tmp
> branch with the message queue fixes) we then fretted over the numbers.
> It looks to me that there is quite a bit of variance in the numbers that
> the OSU latency, IMB latency and mpi_ping all produce.
> For example in using the OSU latency tests we say the MX MTL have a .01
> us difference between v1.2 and the tmp branch (in favor of v1.2).
> However the mean, trimmed mean and median have about .02-07us difference
> (in favor of the tmp branch). To me the data looks pretty much the same
> and the fact that we are measuring the averages (ie none of the tests
> pick out the minimum value) makes these numbers even more hard to really
> nail down IMHO. I've essentially seen this affect for the other tests
> (IMB and mpi_ping).
> For the SM timings using the mpi_ping tests we have seen a range of
> average latencies from 1.47-1.5 us for both the tmp and v1.2 so they
> seem like moral equivalents to me. Rich Graham has led me to believe
> that he might get more consistent numbers but we are not able to and so
> I can only deduce that the numbers are essentially the same.
> In conclusion I believe both the CM PML (MX MTL) and the SM BTL
> performance is about the same between the tmp branch and v1.2. Because
> of this I would like to request that the 1097 cmr be put into 1.2.4. If
> others disagree with my assessment above I think a discussion will need
> to ensue and I would welcome further testing by others that may show
> that the changes have regressed performance (or not). I would like to
> set a timeout of 12 noon ET for others to comment whether these new
> findings puts our fears at ease. At that time if not descenting
> comments have been received I would like to ask Tim to pull in these
> changes and rebuild 1.2.4.
> Nikolay Piskun wrote:
>> Just to verify, before I'll start testing this, there will be no
>> message queue debugging support in this version, correct? This all
>> goes to 1.3 release.
>> Best Regards,
>> P.S. It looks like it is time for us to be more formally involved in
>> this work.
>> Nikolay Piskun
>> Director of Continuing Engineering, TotalView Technologies
>> 24 Prime Parkway, Natick, MA 01760
>> devel mailing list
> devel mailing list