George Bosilca wrote:
> The default values for the large message fragments are not optimized
> for the new generation processors. This might be something to
> investigate, in order to see if we can have the same bandwidth as they
> do or not.
Are you suggesting bumping up the btl_sm_max_send_size value from 32K to
> On Mar 17, 2009, at 18:23 , Eugene Loh wrote:
>> A colleague of mine ran some microkernels on an 8-way Barcelona box
>> (Sun x2200M2 at 2.3 GHz). Here are some performance comparisons with
>> Scali. The performance tests are modified versions of the HPCC
>> pingpong tests. The OMPI version is the trunk with my "single-queue"
>> fixes... otherwise, OMPI latency at higher np would be noticeably worse.
>> latency(ns) bandwidth(MB/s)
>> (8-byte msgs) (2M-byte msgs)
>> ============= =============
>> np Scali OMPI Scali OMPI
>> 2 327 661 1458 1295
>> 4 369 670 1517 1287
>> 8 414 758 1535 1294
>> OMPI latency is nearly 2x slower than Scali's. Presumably,
>> "fastpath" PML latency optimizations would help us a lot here.
>> Thankfully, our latency is flat with np with the recent
>> "single-queue" fixes... otherwise our high-np latency story would be
>> so much worse. We're behind on bandwidth as well, though not as
>> pitifully so.
>> devel mailing list
> devel mailing list