A highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Walid Najjar* *University of California, Riverside **The.

Slides:



Advertisements
Similar presentations
1 Fast Configurable-Cache Tuning with a Unified Second-Level Cache Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.
Advertisements

Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
Reducing Leakage Power in Peripheral Circuits of L2 Caches Houman Homayoun and Alex Veidenbaum Dept. of Computer Science, UC Irvine {hhomayou,
Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.
Leakage Energy Management in Cache Hierarchies L. Li, I. Kadayif, Y-F. Tsai, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, and A. Sivasubramaniam Penn State.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
1 A Self-Tuning Configurable Cache Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
Conjoining Soft-Core FPGA Processors David Sheldon a, Rakesh Kumar b, Frank Vahid a*, Dean Tullsen b, Roman Lysecky c a Department of Computer Science.
Power Reduction Techniques For Microprocessor Systems
Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.
Adaptive Techniques for Leakage Power Management in L2 Cache Peripheral Circuits Houman Homayoun Alex Veidenbaum and Jean-Luc Gaudiot Dept. of Computer.
Using one level of Cache:
Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.
Frank Vahid, UC Riverside 1 System-on-a-Chip Platform Tuning for Embedded Systems Frank Vahid Associate Professor Dept. of Computer Science and Engineering.
A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang, Vahid F., Lysecky R. Proceedings of Design, Automation and Test in Europe Conference.
© ACES Labs, CECS, ICS, UCI. Energy Efficient Code Generation Using rISA * Aviral Shrivastava, Nikil Dutt
Frank Vahid, UC Riverside 1 New Opportunities with Platform Based Design Frank Vahid Associate Professor Dept. of Computer Science and Engineering University.
On the Limits of Leakage Power Reduction in Caches Yan Meng, Tim Sherwood and Ryan Kastner UC, Santa Barbara HPCA-2005.
Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.
A Highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang, Frank Vahid and Walid Najjar University of California, Riverside ISCA 2003.
ClickCAM Using Click for Exploring Power Saving Schemes in Router Architectures Jonathan Ellithorpe, Laura Keys CS 252, Spring 2009.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Dynamic Loop Caching Meets Preloaded Loop Caching – A Hybrid Approach Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.
Compilation Techniques for Energy Reduction in Horizontally Partitioned Cache Architectures Aviral Shrivastava, Ilya Issenin, Nikil Dutt Center For Embedded.
Frank Vahid, UC Riverside 1 Recent Results at UCR with Configurable Cache and Hw/Sw Partitioning Frank Vahid Associate Professor Dept. of Computer Science.
1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.
1 Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems G. Pokam and F. Bodin.
Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering.
Synthesis of Customized Loop Caches for Core-Based Embedded Systems Susan Cotterell and Frank Vahid* Department of Computer Science and Engineering University.
A One-Shot Configurable- Cache Tuner for Improved Energy and Performance Ann Gordon-Ross 1, Pablo Viana 2, Frank Vahid 1, Walid Najjar 1, and Edna Barros.
Automatic Tuning of Two-Level Caches to Embedded Applications Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.
Frank Vahid, UC Riverside 1 Self-Improving Configurable IC Platforms Frank Vahid Associate Professor Dept. of Computer Science and Engineering University.
1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
CPACT – The Conditional Parameter Adjustment Cache Tuner for Dual-Core Architectures + Also Affiliated with NSF Center for High- Performance Reconfigurable.
A Compiler-in-the-Loop (CIL) Framework to Explore Horizontally Partitioned Cache (HPC) Architectures Aviral Shrivastava*, Ilya Issenin, Nikil Dutt *Compiler.
A Novel Cache Architecture with Enhanced Performance and Security Zhenghong Wang and Ruby B. Lee.
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
Analysis of Branch Predictors
ISLPED’99 International Symposium on Low Power Electronics and Design
1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
A Single-Pass Cache Simulation Methodology for Two-level Unified Caches + Also affiliated with NSF Center for High-Performance Reconfigurable Computing.
Multiple Sleep Mode Leakage Control for Cache Peripheral Circuits in Embedded Processors Houman Homayoun, Avesta Makhzan, Alex Veidenbaum Dept. of Computer.
LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches Aarul Jain, Cambridge Silicon Radio, Phoenix Aviral Shrivastava, Arizona State.
A S ELF -T UNING C ACHE ARCHITECTURE FOR E MBEDDED S YSTEMS Chuanjun Zhang, Frank Vahid and Roman Lysecky Presented by: Wei Zang Mar. 29, 2010.
Houman Homayoun, Sudeep Pasricha, Mohammad Makhzan, Alex Veidenbaum Center for Embedded Computer Systems, University of California, Irvine,
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
Kyushu University Las Vegas, June 2007 The Effect of Nanometer-Scale Technologies on the Cache Size Selection for Low Energy Embedded Systems.
Analysis of Cache Tuner Architectural Layouts for Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki.
Lecture Objectives: 1)Explain the relationship between miss rate and block size in a cache. 2)Construct a flowchart explaining how a cache miss is handled.
1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000.
Making Good Points : Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods David Sheldon, Frank.
Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
Lightweight Runtime Control Flow Analysis for Adaptive Loop Caching + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing Marisha.
1 Frequent Loop Detection Using Efficient Non-Intrusive On-Chip Hardware Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering.
1 Improved Policies for Drowsy Caches in Embedded Processors Junpei Zushi Gang Zeng Hiroyuki Tomiyama Hiroaki Takada (Nagoya University) Koji Inoue (Kyushu.
Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.
Timing Anomalies in Dynamically Scheduled Microprocessors Thomas Lundqvist, Per Stenstrom (RTSS ‘99) Presented by: Kaustubh S. Patil.
Energy-Efficient Address Translation
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
Ann Gordon-Ross and Frank Vahid*
10 ways to cheat with energy
A Self-Tuning Configurable Cache
Automatic Tuning of Two-Level Caches to Embedded Applications
Presentation transcript:

A highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Walid Najjar* *University of California, Riverside **The Center for Embedded Computer Systems at UC Irvine ISCA 2003 Reviewer: Liang Zhu

Outline Why a Configurable Cache? Configurable Associativity by Way Concatenation Configurable Way Concatenation and Way Shutdown Conclusions

Computing Total Memory-Related Energy Considers CPU stall energy and off-chip memory energy Excludes CPU active energy Thus, represents all memory-related energy energy_mem = energy_dynamic + energy_static Underlined – measured quantities SimpleScalar (cache_hits, cache_misses, cycles) energy_dynamic = cache_hits * energy_hit + cache_misses * energy_miss energy_static = cycles * energy_static_per_cycle

Why Choose Cache: Impacts Performance and Power Performance impacts are well known Power ARM920T: Caches consume 50% of total processor system power (Segars 01) M*CORE: Unified cache consumes 50% of total processor system power (Lee/Moyer/Arends 99)

Cache Associativity Reduces miss rate – thus improving performance Impact on power and energy?

Associativity is Costly Associativity improves hit rate, but at the cost of more power per access Are the power savings from reduced misses outweighed by the increased power per hit? Energy per access for 8 Kbyte cache

Best performing cache is not always lowest energy Significantly poorer energy Associativity and Energy

Associativety Dilemma Direct mapped cache Poor hit rate on most examples But Low power per access Four-way set-associative cache Good hit rate on nearly all examples But high power per access

So What’s the Best Cache? Looking at popular embedded processors, there’s obviously no standard cache Dilemma Direct mapped –good performance and energy for most programs Four-way – good performance for all programs, but at cost of higher power per access for all programs Whether to design for the average case or the worst case?

Solution to the Dilemma Configurable cache can be configured as four way, two way, or one way Four-way set-associative base cache Ways can be concatenated to form two-way Can be further concatenated to direct- mapped Way 1Way 2Way 3 Way 4 four-way Way 1Way 2 two-way Way 1 direct mapped

Original Cache Layout

Configurable Cache Design: Way Concatenation

Simulated the circuit in Cadence’s Spectra Note energy savings with reduction of ways Concerns over access time addressed With transistor sizing: match delay, +1% area Analyzing the Results

Way Concatenate Experiments Experiment Motorola PowerStone benchmark g3fax Considering dynamic power only L1 access energy, CPU stall energy, memory access energy

Previous Method – Way Shutdown Albonesi proposed a cache where ways could be shut down -To save dynamic power Motorola M*CORE has same way-shutdown feature Unified cache – even allows setting each way as I, D, both, or off Way 1Way 2Way 3 Way 4 Reduces dynamic power by accessing fewer ways But, decreases total size, so may increase miss rate

Experimental results

Way Shutdown Can be Good for Static Power When off, prevents leakage. But 5% area overhead, and increase 8% performance Static power (leakage) increasingly important in nanoscale technologies We combine way shutdown with way concatenate Use sleep transistor method of Powell (ISLPED 2000)

Way Concatenate Plus Way Shutdown 100% = 4-way conventional cache

Conclusions have introduced a novel configurable cache design method called way concatenation. For dynamic power, way concatenation shows average energy savings of 40% compared to a conventional four-way set-associative way concatenation to be superior to previously proposed way shutdown methods A configurable cache with way concatenation, way shutdown can save a lot of energy

Questions?