Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: opal_cache_line_size
From: Rayson Ho (raysonlogin_at_[hidden])
Date: 2012-04-23 17:04:29

On Mon, Apr 23, 2012 at 4:21 PM, Jeffrey Squyres <jsquyres_at_[hidden]> wrote:
> No one replied to this RFC.  Does anyone have an opinion about it?
> I have attached a patch (including some debugging output) showing my initial implementation.  If no one objects by the end of this week, I'll commit to the trunk.

I have a naive question - why do we need to find the cacheline size of
the L1? If it is to avoid cacheline ping-pong, shouldn't we set
"opal_cache_line_size" to at least the line size of the L2?

I am not a cache coherency expert (so correct me if I am wrong) - I
think most of the modern processors keep track of memory ownership (in
the MOSI or MOESI protocols) by the L2 line size. So if L1 line size
is smaller than L2 line size, then we will still get cache line ping
pong effect in those processors.

I quickly googled and found that in modern AMD & Intel processors, L1
line size is same as the L2 line size, and same is true for K
computer's SPARC64 VIIIfx.

However, Itanium has L1 line size = 32 bytes, L2 line size = 64 bytes.
And it's the L2 that interfaces the bus logic:

So if we dirty an L1 cache line, the cache coherency logic would mark
the whole 64-byte L2 line as dirty (Modified). Thus if another
thread/processor owns a seperate L1 that is next to the first line and
thus shares the L2 line, we would still get false sharing...


Open Grid Scheduler / Grid Engine

Scalable Grid Engine Support Program

> Terry: please add this to the agenda tomorrow.
> On Mar 30, 2012, at 1:09 PM, Jeffrey Squyres wrote:
>> I was just recently reminded of a comment that is near the top of opal_init_util():
>>    /* JMS See note in runtime/opal.h -- this is temporary; to be
>>       replaced with real hwloc information soon (in trunk/v1.5 and
>>       beyond, only).  This *used* to be a #define, so it's important
>>       to define it very early.  */
>>    opal_cache_line_size = 128;
>> A few points:
>> 1. On my platforms, hwloc tells me that my cache line size is 64, not 128.  Probably not a tragedy, but...
>> 2. I see opal_cache_line_size being used in a lot of BTL and PML initialization locations.  I see it being used in opal/class/free_list.*, too.
>> 3. I poked around with this yesterday to see if we could have hwloc initialize the opal_cache_line_size value.  Points to remember:
>> - we initialize the opal hwloc framework in opal_init(), but we do not load the local machine's architecture then (because it can be expensive, particularly if lots of MPI processes are all doing it simultaneously)
>> - instead, the local machine topology is discovered once by each orted (using hwloc) and then RML sent to each local MPI process, where it is locally loaded into each MPI proc's hwloc tree
>> - this happens during the orte_init() in ompi_mpi_init()
>> Meaning: we can initialize the opal_cache_line_size in MPI processes during orte_init().
>> Is this acceptable to everyone?
>> If so, I can go ahead and code this up.  I would probably leave the initial value hard-coded to 128 (just in case something uses it before orte_init()), and then later during orte_init(), reset it to the smallest L1 cache size that hwloc finds on the machine.
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> _______________________________________________
> devel mailing list
> devel_at_[hidden]

Open Grid Scheduler - The Official Open Source Grid Engine