Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki.

Slides:



Advertisements
Similar presentations
Our approach! 6.9% Perfect L2 cache (hit rate 100% ) 1MB L2 cache Cholesky 47% speedup BASE: All cores are used to execute the application-threads. PB-GS(PB-LS)
Advertisements

Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Synonymous Address Compaction for Energy Reduction in Data TLB Chinnakrishnan Ballapuram Hsien-Hsin S. Lee Milos Prvulovic School of Electrical and Computer.
1 Recap: Memory Hierarchy. 2 Unified vs.Separate Level 1 Cache Unified Level 1 Cache (Princeton Memory Architecture). A single level 1 cache is used for.
S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat.
A highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Walid Najjar* *University of California, Riverside **The.
A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang, Vahid F., Lysecky R. Proceedings of Design, Automation and Test in Europe Conference.
A Highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang, Frank Vahid and Walid Najjar University of California, Riverside ISCA 2003.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
1 Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems G. Pokam and F. Bodin.
Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering.
Automatic Tuning of Two-Level Caches to Embedded Applications Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
CPACT – The Conditional Parameter Adjustment Cache Tuner for Dual-Core Architectures + Also Affiliated with NSF Center for High- Performance Reconfigurable.
A Thermal-Aware Mapping Algorithm for Reducing Peak Temperature of an Accelerator Deployed in a 3D Stack A Thermal-Aware Mapping Algorithm for Reducing.
Determining the Optimal Process Technology for Performance- Constrained Circuits Michael Boyer & Sudeep Ghosh ECE 563: Introduction to VLSI December 5.
A Low-Cost Memory Remapping Scheme for Address Bus Protection Lan Gao *, Jun Yang §, Marek Chrobak *, Youtao Zhang §, San Nguyen *, Hsien-Hsin S. Lee ¶
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
ISLPED’99 International Symposium on Low Power Electronics and Design
1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.
ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.
2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large.
A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor Farhad Mehdipour, H. Noori, B.
Kyushu University Koji Inoue ICECS'061 Supporting A Dynamic Program Signature: An Intrusion Detection Framework for Microprocessors Koji Inoue Department.
Generating and Executing Multi-Exit Custom Instructions for an Adaptive Extensible Processor Hamid Noori †, Farhad Mehdipour ‡, Kazuaki Murakami †, Koji.
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
A S ELF -T UNING C ACHE ARCHITECTURE FOR E MBEDDED S YSTEMS Chuanjun Zhang, Frank Vahid and Roman Lysecky Presented by: Wei Zang Mar. 29, 2010.
Energy Efficient D-TLB and Data Cache Using Semantic-Aware Multilateral Partitioning School of Electrical and Computer Engineering Georgia Institute of.
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
Dynamic Phase-based Tuning for Embedded Systems Using Phase Distance Mapping + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
Kyushu University Las Vegas, June 2007 The Effect of Nanometer-Scale Technologies on the Cache Size Selection for Low Energy Embedded Systems.
Analysis of Cache Tuner Architectural Layouts for Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
Design Space Exploration for a Coarse Grain Accelerator Farhad Mehdipour, Hamid Noori, Morteza Saheb Zamani*, Koji Inoue, Kazuaki Murakami Kyushu University,
Thermal-aware Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work was supported.
Exploiting Scratchpad-aware Scheduling on VLIW Architectures for High-Performance Real-Time Systems Yu Liu and Wei Zhang Department of Electrical and Computer.
Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,
PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying.
Analyzing the Impact of Data Prefetching on Chip MultiProcessors Naoto Fukumoto, Tomonobu Mihara, Koji Inoue, Kazuaki Murakami Kyushu University, Japan.
High Performance, Low Power Reconfigurable Processor for Embedded Systems Farhad Mehdipour, Hamid Noori, Koji Inoue, Kazuaki Murakami Kyushu University,
Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.
Patricia Gonzalez Divya Akella VLSI Class Project.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.
1 Efficient System-on-Chip Energy Management with a Segmented Counting Bloom Filter Mrinmoy Ghosh- Georgia Tech Emre Özer- ARM Ltd Stuart Biles- ARM Ltd.
ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009.
126 March 2006ODES-4 Performance Optimization for Low-Leakage Caches based on Sleep-Line Access Density Reiko Komiya †, Koji Inoue ‡ and Kazuaki Murakami.
Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
1 Improved Policies for Drowsy Caches in Embedded Processors Junpei Zushi Gang Zeng Hiroyuki Tomiyama Hiroaki Takada (Nagoya University) Koji Inoue (Kyushu.
Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.
1 Compiler Managed Dynamic Instruction Placement In A Low-Power Code Cache Rajiv Ravindran, Pracheeti Nagarkar, Ganesh Dasika, Robert Senger, Eric Marsman,
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
“Temperature-Aware Task Scheduling for Multicore Processors” Masters Thesis Proposal by Myname 1 This slides presents title of the proposed project State.
Overview Motivation (Kevin) Thermal issues (Kevin)
Evaluating Register File Size
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
Tosiron Adegbija and Ann Gordon-Ross+
Ann Gordon-Ross and Frank Vahid*
A High Performance SoC: PkunityTM
Automatic Tuning of Two-Level Caches to Embedded Applications
A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,
Presentation transcript:

Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki Murakami ‡ Speaker: Tohru Ishihara ‡ † Institute of Systems & Information Technologies/KYUSHU, Japan ‡ Kyushu University, Japan

France Kyushu University 2/26 Outline Background Motivation Problem Definition Proposed Approach  Architecture  Reconfiguration Flow Experimental Results Conclusions

France Kyushu University 3/26 Outline Background Motivation Problem Definition Proposed Approach  Architecture  Reconfiguration Flow Experimental Results Conclusions

France Kyushu University 4/26 Background(1/2) The dynamic energy per a cache access The leakage power of a cache memory

France Kyushu University 5/26 Background(2/2)

France Kyushu University 6/26 Outline Background Motivational Example Problem Definition Proposed Approach  Architecture  Reconfiguration Flow Experimental Results Conclusions

France Kyushu University 7/26 Motivational Example (1/3)

France Kyushu University 8/26 Motivational Example (2/3) Total dynamic energy for executing a program Total static energy for executing a program

France Kyushu University 9/26 Motivational Example (3/3) Minimum-energy cache size

France Kyushu University 10/26 Outline Background Motivation Problem Definition Proposed Approach  Architecture  Reconfiguration Flow Experimental Results Conclusions

France Kyushu University 11/26 Problem Definition (1/3) Objective function: total memory energy  Cache dynamic energy  Cache static energy  Off-chip memory access energy  Energy consumption during processor stall CPU I-$ D-$ Main memory

France Kyushu University 12/26 Problem Definition (2/3) energy_memory(C, Temp, Tech) = energy_dynamic(C, Tech) + energy_static(C, Temp, Tech) (1) energy_dynamic(C, Tech) = cache_accesses(C) * energy_cache_access(C, Tech) + cache_misses(C) * energy_miss(C,Tech) (2) energy_miss(C, Tech) = energy_off_chip_stall + energy_cache_block_refill(C, Tech) (3) energy_static(C, Temp, Tech) = executed_clock_cycles(C) * clock_period * leakage_power(C, Temp, Tech) (4)

France Kyushu University 13/26 Problem Definition (3/3) “For a given application, processor architecture, technology, and valid configurations of the configurable cache, find a valid cache configuration that results in minimum energy consumption in a specific temperature over the entire execution of the given application.”

France Kyushu University 14/26 Outline Background Motivation Problem Definition Proposed Approach  Architecture  Reconfiguration Flow Experimental Results Conclusions

France Kyushu University 15/26 Architecture TACC  BCC (proposed by Zhang et al. [1]) Cache size (way shutdown) Number of ways (way concatenation) Line size  Thermal sensor  Accessible port for reading the thermal sensor [1] C. Zang, F. Vahid and W. Najjar,.“A Highly Configurable Cache Architecture for Embedded Systems,” ACM Trans. on Embedded Computing Systems, vol.4, no.2, May 2005

France Kyushu University 16/26 Reconfiguration Flow

France Kyushu University 17/26 Outline Background Motivation Problem Definition Proposed Approach  Architecture  Reconfiguration Flow Experimental Results Conclusions

France Kyushu University 18/26 Experiment Setup (1/2) Mibench Simplescalar  Cache hit: one clock cycle  Cache miss: 100 clock cycles  Clock freq of the base processor: 200 MHz CACTI 4.2  Target technology 70nm (Vdd=0.9) BCC (16KB)  16KB (4-, 2-, 1-way)  8KB (2-, and 1-way)  4KB (1-way)  The line size for each of the configurations can be 8-, 16-, or 32- byte.

France Kyushu University 19/26 Experimental Setup (2/2) Base Configurable Cache (BCC)  It has the same architecture proposed by Zhang et al. [1]  It supports a limited set of configurations  It is configured for each application for corner-case (i.e. leakage at 100°C) Temperature-Aware Configurable Cache (TACC)  TACC is configured for each execution of an application considering the chip temperature at that time [1] C. Zang, F. Vahid and W. Najjar,.“A Highly Configurable Cache Architecture for Embedded Systems,” ACM Trans. on Embedded Computing Systems, vol.4, no.2, May 2005

France Kyushu University 20/26 Energy & Performance Evaluation Energy Saving = × 100 Performance Enhancement = × 100

France Kyushu University 21/26 Data and Instruction Cache D$ qsortdjpeglamedijkstrapatriciashaadpcmcrcfft 0°C16K, 32, 2 16K, 32, 416K, 32, 2 8K, 32, 2 16K, 32, 4 20°C8K, 32, 216K, 32, 216K, 32, 416K, 32, 2 8K, 32, 18K, 32, 2 16K, 32, 4 40°C8K, 32, 216K, 32, 216K, 32, 48K, 32, 216K, 32, 24K, 32, 18K, 32, 2 16K, 32, 4 60°C8K, 32, 216K, 32, 2 8K, 32, 2 4K, 32, 14K, 16, 18K, 32, 2 80°C8K, 32, 2 16K, 32, 28K, 32, 2 4K, 32, 14K, 16, 14K, 32, 18K, 32, 2 100°C4K, 32, 18K, 32, 2 4K, 32, 1 8K, 32, 2 I$ basimathqsortdjpeglamedijkstrablowfishrijndaelgsmfft 0°C16K, 8, 4 16K, 32, 116K, 32, 216K, 32, 116K, 16, 216K, 32, 116K, 16, 48K, 32, 1 20°C16K, 16, 4 16K, 32, 116K, 32, 216K, 32, 116K, 16, 216K, 32, 116K, 32, 28K, 32, 1 40°C16K, 16, 4 8K, 32, 2 16K, 32, 216K, 32, 116K, 32, 28K, 32, 1 60°C16K, 16, 4 8K, 32, 2 16K, 32, 216K, 32, 18K, 32, 28K, 32, 1 80°C16K, 32, 4 8K, 32, 2 16K, 32, 14K, 32, 18K, 32, 1 100°C16K, 32, 4 8K, 32, 2 16K, 32, 24K, 32, 18K, 32, 1

France Kyushu University 22/26 Energy Saving

France Kyushu University 23/26 Performance Enhancement

France Kyushu University 24/26 Outline Background Motivation Problem Definition Proposed Approach  Architecture  Reconfiguration Flow Experimental Results Conclusions

France Kyushu University 25/26 Conclusions 1. Importance of temperature-aware configurable cache for finer technologies. Up to 61% (17% on average) energy consumption in 70nm technology for instruction cache 2. Data cache is more easily affected by temperature than instruction cache. Using a configurable data cache, up to 77% (36% on average) energy can be saved in 70nm technology. 3. The TACC improves the performance for instruction cache up to 28% (5% on average) and for data cache, it is up to 17% (8.1% in average).

France Kyushu University 26/26 Thank you for your attention Please ask any questions to

France Kyushu University 27/26 Backup slides

France Kyushu University 28/26

France Kyushu University 29/26

France Kyushu University 30/26 ARM7TDMIARM966E-S 130nmPower consumption 7.98 mW62.5 mW Frequency133 MHz250 MHz 90nmPower consumption 7.08 mW51.7 mW Frequency236 MHz470 MHz