A Self-Tuning Configurable Cache

Slides:



Advertisements
Similar presentations
1 Fast Configurable-Cache Tuning with a Unified Second-Level Cache Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.
Advertisements

Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
1 A Self-Tuning Configurable Cache Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
Conjoining Soft-Core FPGA Processors David Sheldon a, Rakesh Kumar b, Frank Vahid a*, Dean Tullsen b, Roman Lysecky c a Department of Computer Science.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.
Application-Specific Customization of Parameterized FPGA Soft-Core Processors David Sheldon a, Rakesh Kumar b, Roman Lysecky c, Frank Vahid a*, Dean Tullsen.
A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department of Computer Science and Engineering.
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department.
A highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Walid Najjar* *University of California, Riverside **The.
A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang, Vahid F., Lysecky R. Proceedings of Design, Automation and Test in Europe Conference.
On the Limits of Leakage Power Reduction in Caches Yan Meng, Tim Sherwood and Ryan Kastner UC, Santa Barbara HPCA-2005.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Dynamic Loop Caching Meets Preloaded Loop Caching – A Hybrid Approach Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.
Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering.
A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power Frank Vahid* and Ann Gordon-Ross Dept. of Computer Science and Engineering University.
A One-Shot Configurable- Cache Tuner for Improved Energy and Performance Ann Gordon-Ross 1, Pablo Viana 2, Frank Vahid 1, Walid Najjar 1, and Edna Barros.
Automatic Tuning of Two-Level Caches to Embedded Applications Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.
Frank Vahid, UC Riverside 1 Self-Improving Configurable IC Platforms Frank Vahid Associate Professor Dept. of Computer Science and Engineering University.
1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.
CPACT – The Conditional Parameter Adjustment Cache Tuner for Dual-Core Architectures + Also Affiliated with NSF Center for High- Performance Reconfigurable.
A Fast On-Chip Profiler Memory Roman Lysecky, Susan Cotterell, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University.
A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power Frank Vahid* and Ann Gordon-Ross Dept. of Computer Science and Engineering University.
1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
A Single-Pass Cache Simulation Methodology for Two-level Unified Caches + Also affiliated with NSF Center for High-Performance Reconfigurable Computing.
A S ELF -T UNING C ACHE ARCHITECTURE FOR E MBEDDED S YSTEMS Chuanjun Zhang, Frank Vahid and Roman Lysecky Presented by: Wei Zang Mar. 29, 2010.
Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,
Dynamic Phase-based Tuning for Embedded Systems Using Phase Distance Mapping + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.
Analysis of Cache Tuner Architectural Layouts for Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
Thermal-aware Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work was supported.
Minimum Effort Design Space Subsetting for Configurable Caches + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work.
Making Good Points : Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods David Sheldon, Frank.
1 of 20 Low Power and Dynamic Optimization Techniques for Power-Constrained Domains Ann Gordon-Ross Department of Electrical and Computer Engineering University.
Codesigned On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also.
Combining Software and Hardware Monitoring for Improved Power and Performance Tuning Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division.
Scott Sirowy, Chen Huang, and Frank Vahid † Department of Computer Science and Engineering University of California, Riverside {ssirowy,chuang,
Exploiting Dynamic Phase Distance Mapping for Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the.
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
1 Compiler Managed Dynamic Instruction Placement In A Low-Power Code Cache Rajiv Ravindran, Pracheeti Nagarkar, Ganesh Dasika, Robert Senger, Eric Marsman,
Memory Segmentation to Exploit Sleep Mode Operation
Selective Code Compression Scheme for Embedded System
Techniques for Reducing Read Latency of Core Bus Wrappers
Department of Electrical & Computer Engineering
Anne Pratoomtong ECE734, Spring2002
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
Department of Computer Science University of California, Santa Barbara
Tosiron Adegbija and Ann Gordon-Ross+
Autonomously Adaptive Computing: Coping with Scalability, Reliability, and Dynamism in Future Generations of Computing Roman Lysecky Department of Electrical.
Ann Gordon-Ross and Frank Vahid*
Module IV Memory Organization.
An Adaptive Middleware for Supporting Time-Critical Event Response
Adapted from slides by Sally McKee Cornell University
Tosiron Adegbija and Ann Gordon-Ross+
Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,
Cache - Optimization.
Automatic Tuning of Two-Level Caches to Embedded Applications
Maximizing Speedup through Self-Tuning of Processor Allocation
Department of Computer Science University of California, Santa Barbara
Phase based adaptive Branch predictor: Seeing the forest for the trees
Restrictive Compression Techniques to Increase Level 1 Cache Capacity
Presentation transcript:

A Self-Tuning Configurable Cache Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine This work was supported by the U.S. National Science Foundation, and by the Semiconductor Research Corporation

Cache Power Consumption Memory access: 50% of embedded processor’s system power Caches are power hungry ARM920T (Segars 01) M*CORE (Lee/Moyer/Arends 99) Thus, caches are a good candidate for optimizations Different applications have vastly different cache requirements Total size, line size, associativity Find newer reference 2KB 32 byte direct-mapped 4KB 16 byte, 2-way 8KB 64 byte, 4-way

Cache Tuning Cache tuning is the process of determining the appropriate cache parameters for an application - can be done during runtime Requires a tunable cache Cache parameter values can be varied during runtime Requires tuning hardware Orchestrates cache tuning Download application Executing in base configuration Energy Tunable cache Tuning hw Cache Tuning TC TC TC TC TC TC TC TC TC TC TC Microprocessor

Online Cache Tuning Time Energy Consumption Base cache energy Change cache Phase-tuned Reconfigure the cache dynamically to adapt to different phases of program execution or different applications in a multi-application environment In this talk, I describe research that addresses when to reconfigure the cache for a periodic system Feedback-control system for online cache tuning

Online Cache Tuning Challenges Need a good tuning interval Tuning interval is the time between invocations of the tuning hardware Should closely match phase interval - length of time the system executes between phase changes Wasted energy in suboptimal configuration Base cache energy Time Energy Consumption Base cache energy Time Energy Consumption Excess tuning energy Runtime energy Time Energy Consumption Phase Interval Tuning interval too long Tuning interval too short Runtime energy Base cache energy Tuning interval Tuning interval Previous methods use a fixed tuning interval and do not analyze the value chosen Problem: How does the tuning hardware determine when to invoke cache tuning - must have knowledge of the future to obtain optimal results

Periodic System - Fixed Phase Interval Phase interval fixed at 10 million cycles Energy savings = 32% (includes 7% overhead due to tuning) Tuning interval too short Tuning interval too long Negative savings if tuning interval is greater than phase interval! Base Line

Online Algorithms Need to determine tuning interval while system is executing Online algorithms process data piecemeal - unable to view entire dataset Online tuner must be able to determine the tuning interval based on current and past events with no knowledge of future Goal: Adjust tuning interval (TI) to match phase interval Observe change in energy due to tuning Time Energy Consumption Cache tuning Phase change occurred Phase change occurred TI TI TI TI TI TI TI TI TI TI TI TI TI No change in energy No change in energy Change in energy No change in energy

Online Cache Tuner – Feedback Control System We model our online cache tuner as a feedback control system Controller Logic is based on attack/decay online algorithm Draw on fuzzy logic to stabilize tuning interval Change tuning interval based on how close or far the system is to being “stable” Use a 2 part equation

Controller Logic If %∆E < PoS, If %∆E >= PoS, Stable System PoS %∆E averaged over last W measurements to eliminate erratic behavior U %∆E 100% Multiplicative change to tuning interval (∆TI) ∆TI 1.0 D Small energy change, tunes too frequently, increase tuning interval (tuning interval too short) Large energy change, tunes too infrequently, decrease tuning interval (tuning interval too long) Determine U, D, PoS and W through experimentation %∆E

Online Cache Tuner Energy Savings Base line Normalized Energy 29% energy savings - within 8% of optimal Observed similar results for less periodic systems and systems with more applications - see paper for details.

Conclusions Observed that fixed tuning intervals may not reveal energy savings Important to vary tuning interval to match system needs Developed a feedback control system for online cache tuning 29% energy savings on average - 8% from optimal Continuing work for more random systems