1 Fast Configurable-Cache Tuning with a Unified Second-Level Cache Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems, UC Irvine Nikil Dutt Center for Embedded Computer Systems School for Information and Computer Science University of California, Irvine This work was supported by the U.S. National Science Foundation and by the Semiconductor Research Corporation
2 Cache Hierarchy Optimizations The cache hierarchy is a good candidate for optimizations Applications require highly diverse cache configurations for optimal energy consumption of the cache subsystem Over 50% energy savings possible in the cache subsystem due to configuration [Gordon-Ross 04] ARM920T(Segars 01)
3 Previous Cache Tuning Methodologies Previous methods limit configurability to facilitate easier heuristic development Microprocessor Main Memory I$ D$ Tuner Single level cache subsystem with separate caches - less than 50 configurations Microprocessor Main Memory I$ D$ Tuner I$ D$ Multi-level cache subsystem with separate caches - a few hundred configurations
4 Motivation Unified second level caches are commonplace in desktop computers and are becoming increasingly popular in embedded microprocessors Current cache tuning heuristics do not directly apply due to the complexity of tuning in the presence of a unified second level of cache - circular dependency Search space explodes to 18,000 configurations L1 I$L2 U$L1 D$ A change in any cache effects the performance of all other caches in the hierarchy
5 Motivation We present an effective and efficient cache tuning heuristic for a highly configurable cache hierarchy including a unified second level of cache. Microprocessor Main Memory I$ D$ Tuner U$
6 Level One Configurable Cache The base cache consists of 4 2KByte banks that may individually be shutdown for size configuration Line size is configurable Way concatenation allows for configurable associativity For evaluation of energy savings, we used a base cache of size 8KB with a 32 byte line size and 4 way associativity 8 KBytes Way shutdown 4 KBytes 2 KB 8 KBytes 2-way 2 KB Way concatenation
7 Level Two Configurable Cache For maximum configurability, level two cache utilized the Motorola M*CORE style way management Ways can be designated as instruction, data, unified, or off Line size is configurable For evaluation of energy savings, we used a base cache size of 64 KB with a 64 byte line size and 4 fully unified ways I-way D-way U-way
8 Alternating Cache Exploration with Additive Way Tuning (ACE-AWT) Tune level one sizes I D Tune level two size I Tune level one line sizes D D Tune level two line size { } I { } D Tune level two associativity { } Tune level one associativities These steps are difficult because changing size and associativity is synonymous in a way management style cache
9 ACE-AWT - First Phase DONE The first phase is applied during size exploration
10 ACE-AWT - Fine Tuning Phase DONE Start with resulting cache from the first phase The fine tuning phase is applied during associativity exploration
11 Results - Energy Savings Heuristic achieved near optimal results (when optimal could be computed) 62%energy savings compared to base cache Yet only searched 0.2% of the search space Also improved performance by 35% compared to base cache due to tuned line sizes
12 Conclusions and Future Work We developed an efficient and effective cache tuning heuristic to tune a two level cache with a unified second level of cache 18,000 possible configurations Compared to a reasonable base cache configuration: 62% energy savings Explores only 0.2% of the search space 35% improvement in performance Future work includes application of the tuning heuristic to different execution phases in the application