1 Improved Policies for Drowsy Caches in Embedded Processors Junpei Zushi Gang Zeng Hiroyuki Tomiyama Hiroaki Takada (Nagoya University) Koji Inoue (Kyushu.

Slides:



Advertisements
Similar presentations
Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
Advertisements

Performance Evaluation of Cache Replacement Policies for the SPEC CPU2000 Benchmark Suite Hussein Al-Zoubi.
FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth via Flexible Exclusion Jaewoong Sim Jaekyu Lee Moinuddin K. Qureshi Hyesoon Kim.
Leakage Energy Management in Cache Hierarchies L. Li, I. Kadayif, Y-F. Tsai, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, and A. Sivasubramaniam Penn State.
1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Exploiting Spatial Locality in Data Caches using Spatial Footprints Sanjeev Kumar, Princeton University Christopher Wilkerson, MRL, Intel.
Managing Static (Leakage) Power S. Kaxiras, M Martonosi, “Computer Architecture Techniques for Power Effecience”, Chapter 5.
Power Reduction Techniques For Microprocessor Systems
CML CML Presented by: Aseem Gupta, UCI Deepa Kannan, Aviral Shrivastava, Sarvesh Bhardwaj, and Sarma Vrudhula Compiler and Microarchitecture Lab Department.
Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
Super-Drowsy Caches Single-V DD and Single-V T Super-Drowsy Techniques for Low- Leakage High-Performance Instruction Caches Nam Sung Kim, Krisztián Flautner,
A highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Walid Najjar* *University of California, Riverside **The.
A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang, Vahid F., Lysecky R. Proceedings of Design, Automation and Test in Europe Conference.
On the Limits of Leakage Power Reduction in Caches Yan Meng, Tim Sherwood and Ryan Kastner UC, Santa Barbara HPCA-2005.
Compiler-Directed instruction cache leakage optimizations Discussed by Discussed by Raid Ayoub CSE D EPARTMENT.
A Highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang, Frank Vahid and Walid Najjar University of California, Riverside ISCA 2003.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Compilation Techniques for Energy Reduction in Horizontally Partitioned Cache Architectures Aviral Shrivastava, Ilya Issenin, Nikil Dutt Center For Embedded.
1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.
1 Drowsy Caches Simple Techniques for Reducing Leakage Power Krisztián Flautner Nam Sung Kim Steve Martin David Blaauw Trevor Mudge
1 Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems G. Pokam and F. Bodin.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
Dyer Rolan, Basilio B. Fraguela, and Ramon Doallo Proceedings of the International Symposium on Microarchitecture (MICRO’09) Dec /7/14.
ECE 510 Brendan Crowley Paper Review October 31, 2006.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.
(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)
1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.
2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
CASH: REVISITING HARDWARE SHARING IN SINGLE-CHIP PARALLEL PROCESSOR
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
Computer Science Department In-N-Out: Reproducing Out-of-Order Superscalar Processor Behavior from Reduced In-Order Traces Kiyeon Lee and Sangyeun Cho.
Yun-Chung Yang SimTag: Exploiting Tag Bits Similarity to Improve the Reliability of the Data Caches Jesung Kim, Soontae Kim, Yebin Lee 2010 DATE(The Design,
Abdullah Aldahami ( ) March 23, Introduction 2. Background 3. Simulation Techniques a.Experimental Settings b.Model Description c.Methodology.
LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches Aarul Jain, Cambridge Silicon Radio, Phoenix Aviral Shrivastava, Arizona State.
2013/12/09 Yun-Chung Yang Partitioning and Allocation of Scratch-Pad Memory for Priority-Based Preemptive Multi-Task Systems Takase, H. ; Tomiyama, H.
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
1 Presented By: Michael Bieniek. Embedded systems are increasingly using chip multiprocessors (CMPs) due to their low power and high performance capabilities.
Kyushu University Las Vegas, June 2007 The Effect of Nanometer-Scale Technologies on the Cache Size Selection for Low Energy Embedded Systems.
Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki.
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009.
Sunpyo Hong, Hyesoon Kim
126 March 2006ODES-4 Performance Optimization for Low-Leakage Caches based on Sleep-Line Access Density Reiko Komiya †, Koji Inoue ‡ and Kazuaki Murakami.
Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.
The Effect of Data-Reuse Transformations on Multimedia Applications for Application Specific Processors N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.
Re-configurable Bus Encoding Scheme for Reducing Power Consumption of the Cross Coupling Capacitance for Deep Sub-micron Instructions Bus Siu-Kei Wong.
Effect of Load and Store Reuse on Energy Savings for Multimedia Applications 黃國權 洪吉勇 李永恆 曾學文 黃國權 洪吉勇 李永恆 曾學文 Computer Architecture Term Project.
Evaluating Register File Size
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
On-demand solution to minimize I-cache leakage energy
Tosiron Adegbija and Ann Gordon-Ross+
Ann Gordon-Ross and Frank Vahid*
Adaptive Code Unloading for Resource-Constrained JVMs
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Cache - Optimization.
A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,
Presentation transcript:

1 Improved Policies for Drowsy Caches in Embedded Processors Junpei Zushi Gang Zeng Hiroyuki Tomiyama Hiroaki Takada (Nagoya University) Koji Inoue (Kyushu University)

2 Background (1/2) Cache are now used not only in general-purpose processors but also in embedded processors Cache memory consumes large amount of energy of processors Reducing energy consumption in cache is effective and important ! S. Segars, “Low Power Design Techniques for Microprocessors,” ISSCC Tutorial, Feb e.g. In embedded processor ARM920T, 44% of total energy was consumed in cache

3 Background (2/2) Energy consumption in cache  Dynamic energy  Consumed due to switching activity in cache during cache operation  Leakage energy  Consumed while the power of cache is on, no matter the cache is accessed or not Leakage increases significantly as process feature size shrink  e.g., in 70nm technology, cache leakage occupies up to 70% of total cache energy [Kim02] Reduction of leakage is critical to decrease overall cache energy!

4 Related Work Non-state-preserving cache  DRI cache [Yang01]  Resize cache size dynamically according to cache miss ratio  Cache Decay [Kaxiras01]  Power off the cache lines that have not been accessed during a given decay interval State-preserving cache  Drowsy cache  Lower the voltage of cache lines that may not be accessed in the near future Original policies (Simple / Noaccess) [Flautner02] Policies exploiting temporal locality (MRO / TMRO / RMRO) [Petit05]

5 Drowsy Policies [Flautner02] Simple policy  move all cache lines into low leakage mode at regular time window Noaccess policy  move cache lines into low leakage mode that have not been accessed in the previous time window Supply voltage is lowered in low leakage (drowsy) mode. To access low leakage line, it need to change cache line into awake mode. It needs one or more cycles to change cache line mode. Awake line (Valid) Awake line (Invalid)

6 Drowsy Policies [Flautner02] Simple policy  move all cache lines into low leakage mode at regular time window Noaccess policy  move the cache lines into low leakage mode that have not been accessed in the previous time window Supply voltage is lowered in low leakage (drowsy) mode. To access low leakage line, it need to change cache line into awake mode. It needs one or more cycles to change cache line mode. TimeWindow Drowsy line (Valid) Drowsy line (Invalid)

7 Drowsy Policies [Flautner02] Simple policy  move all cache lines into low leakage mode at regular time window Noaccess policy  move the cache lines into low leakage mode that have not been accessed in the previous time window Supply voltage is lowered in low leakage (drowsy) mode. To access low leakage line, it need to change cache line into awake mode. It needs one or more cycles to change cache line mode. Need to access

8 Drowsy Policies [Flautner02] Simple policy  Transition all cache lines into low leakage mode at regular time window Noaccess policy  Transition cache lines into low leakage mode that have not been accessed in the previous time window Supply voltage is been lower in low leakage mode To access low leakage line, it need to change cache line into awake mode. It needs 1 or several cycles to change cache line mode. Transition into awake mode

9 MRO (Most Recently used On) policy  All lines are changed into Drowsy mode except the MRU line in each cache set  Only one cache line in a cache set is always in awake mode TMRO (Two Most Recently used On) policy  All lines are changed into Drowsy mode except two MRU line in each cache set  Two lines in each cache set are always in awake mode RMRO (Reused Most Recently used On) policy  A cache line which is not accessed during the previous time window goes to (or stay in) low leakage mode.  If only a single line in a set is accessed during the previous time window, keep the line awake.  If more than one line in a set is accessed during the previous time window, keep the two MRU lines awake, and put the other lines in low leakage mode. Drowsy Policies [Petit05]

10 Contributions of This Work Propose yet another four policies which try to balance leakage energy and performance and energy overheads due to mode transition. Evaluate mode transition policies in the context of embedded processors.  Previous work assumes wide-issue out-of-order processors with non-blocking cache, where mode transition cycles can be easily hidden.  This paper assumes single-issue processors with blocking-cache.

11 Proposed Policies (1/2) 1.PMRO (Periodic MRO)  Move all cache lines into low leakage mode at a certain time period except for the MRU line in each cache set 2.PTMRO (Periodic TMRO)  Move all cache lines into low leakage mode at a certain time period except for the two MRU line in each cache set

12 Proposed Policies (1/2) 1.PMRO (Periodic MRO)  Move all cache lines into low leakage mode at fixed window period except for the MRU line in each cache set 2.PTMRO (Periodic TMRO)  Move all cache lines into low leakage mode at fixed window period except for the two MRU line in each cache set MRU way of each cache set Time Window

13 Proposed Policies (1/2) 1.PMRO (Periodic MRO)  Move all cache lines into low leakage mode at fixed window period except for the MRU line in each cache set 2.PTMRO (Periodic TMRO)  Move all cache lines into low leakage mode at fixed window period except for the two MRU line in each cache set Time Window Leave them awake

14 Proposed Policies (2/2) 3.AAM (Access And MRU)  All cache lines are put into low leakage mode except for the MRU line that has been accessed in the previous time window 4. AOM (Access Or MRU)  All cache lines are put into low leakage mode except for the MRU line and the accessed lines in the previous tine window Conditions for staying in awake mode MRU Accessed in previous time window

15 Proposed Policies (2/2) 3.AAM (Access And MRU)  All cache lines are put into low leakage mode except for the MRU line that has been accessed in the previous time window 4. AOM (Access Or MRU)  All cache lines are put into low leakage mode except for the MRU line and the accessed lines in the previous tine window Conditions for staying in awake mode MRU Accessed in previous time window

16 Experimental Setup (1/2) Cycle-accurate instruction simulator  SimpleScalar/ARM to generate the access trace Cache simulator developed in house  Input : access trace  Output: leakage energy and execution time including mode transition overhead  Implemented policies in the simulator for evaluation  Policies not using access history : Simple, MRO, TMRO, PMRO, PTMRO  Policies using access history : Noaccess, RMRO, AAM, AOM Benchmark programs  MediaBench  Encoding / decording of adpcm, g721 and gsm  Decording of jpeg and mpeg2

17 Experimental Setup (2/2) SimpleScalar/ARM configuration  In order, single issue  L1 cache only  Instruction cache size : 8KB Cache simulator configuration  Cache line size : 32B  Cache size : 16KB / 32KB  Associativity : 2 / 4 / 8 ways  Mode Change Penalty (MCP) : 3 cycles  Time window cycles : 4096 cycles Policies evaluated  Conventional cache  5 previous Drowsy policies  4 proposed Drowsy policies

18 Comparisons of Policies Not Using Access History 16KB, MCP=3 cycles

19 Comparisons of Policies Not Using Access History 16KB, MCP=3 cycles In 4 and 8 way cache, ED Product is the lowest in PMRO policy

20 Comparisons of Policies Not Using Access History 32KB, MCP=3 cycles

21 Results of Individual Programs 16KB, 4way Not using access history

22 Comparisons of Policies Using Access History 16KB, MCP=3 cycles

23 Conclusions Summary  We have proposed four policies for Drowsy cache.  The Simple and PMRO policies appear promising among those not using access history.  The Noaccess policy is promising among those using access history.  The Drowsy cache is effective not only in high- performance processors but in embedded processors. Future Work  Apply to Instruction cache  Explore application-specific policy optimization

24 Thank you for your attention!