Something like this. We can play with the eager size too, maybe 4K is
On Mar 18, 2009, at 06:43 , Terry Dontje wrote:
> George Bosilca wrote:
>> The default values for the large message fragments are not
>> optimized for the new generation processors. This might be
>> something to investigate, in order to see if we can have the same
>> bandwidth as they do or not.
> Are you suggesting bumping up the btl_sm_max_send_size value from
> 32K to something greater?
>> On Mar 17, 2009, at 18:23 , Eugene Loh wrote:
>>> A colleague of mine ran some microkernels on an 8-way Barcelona
>>> box (Sun x2200M2 at 2.3 GHz). Here are some performance
>>> comparisons with Scali. The performance tests are modified
>>> versions of the HPCC pingpong tests. The OMPI version is the
>>> trunk with my "single-queue" fixes... otherwise, OMPI latency at
>>> higher np would be noticeably worse.
>>> latency(ns) bandwidth(MB/s)
>>> (8-byte msgs) (2M-byte msgs)
>>> ============= =============
>>> np Scali OMPI Scali OMPI
>>> 2 327 661 1458 1295
>>> 4 369 670 1517 1287
>>> 8 414 758 1535 1294
>>> OMPI latency is nearly 2x slower than Scali's. Presumably,
>>> "fastpath" PML latency optimizations would help us a lot here.
>>> Thankfully, our latency is flat with np with the recent "single-
>>> queue" fixes... otherwise our high-np latency story would be so
>>> much worse. We're behind on bandwidth as well, though not as
>>> pitifully so.
>>> devel mailing list
>> devel mailing list
> devel mailing list