University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.

Slides:



Advertisements
Similar presentations
Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
Advertisements

Subthreshold SRAM Designs for Cryptography Security Computations Adnan Gutub The Second International Conference on Software Engineering and Computer Systems.
IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.
LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.
Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.
Leakage Energy Management in Cache Hierarchies L. Li, I. Kadayif, Y-F. Tsai, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, and A. Sivasubramaniam Penn State.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
University of Michigan Electrical Engineering and Computer Science 1 A Distributed Control Path Architecture for VLIW Processors Hongtao Zhong, Kevin Fan,
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Power Reduction Techniques For Microprocessor Systems
System Design Tricks for Low-Power Video Processing Jonah Probell, Director of Multimedia Solutions, ARC International.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science August 20, 2009 Enabling.
Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
On the Limits of Leakage Power Reduction in Caches Yan Meng, Tim Sherwood and Ryan Kastner UC, Santa Barbara HPCA-2005.
University of Michigan Electrical Engineering and Computer Science 1 Resource Recycling: Putting Idle Resources to Work on a Composable Accelerator Yongjun.
Cost-Efficient Soft Error Protection for Embedded Microprocessors
1 Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems G. Pokam and F. Bodin.
University of Michigan Electrical Engineering and Computer Science 1 StageNet: A Reconfigurable CMP Fabric for Resilient Systems Shantanu Gupta Shuguang.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Maestro: Orchestrating.
Compressed Memory Hierarchy Dongrui SHE Jianhua HUI.
Mrinmoy Ghosh Weidong Shi Hsien-Hsin (Sean) Lee
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
Low Power Techniques in Processor Design
A Novel Cache Architecture with Enhanced Performance and Security Zhenghong Wang and Ruby B. Lee.
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.
University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems.
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
Illusionist: Transforming Lightweight Cores into Aggressive Cores on Demand I2PC March 28, 2013 Amin Ansari 1, Shuguang Feng 2, Shantanu Gupta 3, Josep.
StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache Hyunjin Lee Sangyeun Cho Bruce R. Childers Dept. of Computer Science University.
Dept. of Computer Science, UC Irvine
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu,
Jennifer Winikus Computer Engineering Seminar Michigan Technological University February 10,2011 2/10/2011J Winikus EE
A Robust Pulse-triggered Flip-Flop and Enhanced Scan Cell Design
XIAOYU HU AANCHAL GUPTA Multi Threshold Technique for High Speed and Low Power Consumption CMOS Circuits.
LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches Aarul Jain, Cambridge Silicon Radio, Phoenix Aviral Shrivastava, Arizona State.
Energy Reduction for STT-RAM Using Early Write Termination Ping Zhou, Bo Zhao, Jun Yang, *Youtao Zhang Electrical and Computer Engineering Department *Department.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Bundled Execution.
Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
1 Efficient System-on-Chip Energy Management with a Segmented Counting Bloom Filter Mrinmoy Ghosh- Georgia Tech Emre Özer- ARM Ltd Stuart Biles- ARM Ltd.
University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
33 rd IEEE International Conference on Computer Design ICCD rd IEEE International Conference on Computer Design ICCD 2015 Improving Memristor Memory.
1 Dual-V cc SRAM Class presentation for Advanced VLSIPresenter:A.Sammak Adopted from: M. Khellah,A 4.2GHz 0.3mm 2 256kb Dual-V CC SRAM Building Block in.
Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.
Computer Architecture Chapter (5): Internal Memory
CS203 – Advanced Computer Architecture
Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.
1 Compiler Managed Dynamic Instruction Placement In A Low-Power Code Cache Rajiv Ravindran, Pracheeti Nagarkar, Ganesh Dasika, Robert Senger, Eric Marsman,
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Modeling of Failure Probability and Statistical Design of Spin-Torque Transfer MRAM (STT MRAM) Array for Yield Enhancement Jing Li, Charles Augustine,
FlexiBuffer: Reducing Leakage Power in On-Chip Network Routers
Adaptive Cache Partitioning on a Composite Core
Alireza Shafaei, Shuang Chen, Yanzhi Wang, and Massoud Pedram
CGRA Express: Accelerating Execution using Dynamic Operation Fusion
DynaMOS: Dynamic Schedule Migration for Heterogeneous Cores
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
William Stallings Computer Organization and Architecture 7th Edition
Ann Gordon-Ross and Frank Vahid*
Cache - Optimization.
A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,
Presentation transcript:

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Archipelago: A Polymorphic Cache Design for Enabling Robust Near-Threshold Operation HPCA-17 February 16, 2011 Amin Ansari, Shuguang Feng, Shantanu Gupta, and Scott Mahlke University of Michigan, Ann Arbor

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Matching Power Consumption and Utilization [Roth et. al.] More than 80% of times idle Large SRAM structures limit the Min V dd Logic cells can operate close to V th More than 50% of all computers [Webber et. al.] DVS to improve battery life Core i7 achieves 37% power reduction in idle state. Core i7 achieves 37% power reduction in idle state. 2

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Bit-Error-Rate for an SRAM Cell  Extremely fast growth in failure rate with decreasing V dd 3

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Our Goal  Enabling DVS to push core’s V dd down to o Ultra low voltage region ( < 650mV ) o While preserving correct functionality of on-chip caches 4  Proposing a highly flexible and FT cache architecture that can efficiently tolerate these SRAM failures  Minimizing our overheads in high- power mode

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Archipelago (AP) data chunk sacrificial line This particular cache has only a single functional line. This particular cache has only a single functional line. By forming autonomous islands, AP saves 6 out of 8 lines. By forming autonomous islands, AP saves 6 out of 8 lines. Island 1 Island 2

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Baseline AP Architecture 6 MUXing layer First BankSecond Bank Functional Block G3 Fault Map -- G3 Input Address Data line Sacrificial line Fault map address Memory Map Added modules: + Memory map + Fault map + MUXing layer Two type of lines: + data line + sacrificial line  Two lines have collision, if they have at least one faulty chunk in the same position (10 and 15 are collision free)  There should be no collision between lines within a group [Group 3 (G3) contains lines 4, 10, and 15] S

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science AP with Relaxed Group Formation 7  Sacrificial lines do not contribute to the effective capacity o We want to minimize the total number of groups First Bank Second Bank S S First Bank Second Bank S

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Semi-Sacrificial Lines 8 First Bank Second Bank Sacrificial line Semi-sacrificial line MUXing Layer  Semi-sacrificial line guarantees the parallel access  In contrast to a sacrificial line, it also contributes to the effective cache capacity  Semi-sacrificial line guarantees the parallel access  In contrast to a sacrificial line, it also contributes to the effective cache capacity

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science AP with Semi-Sacrificial Lines 9 S

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science AP Configuration  We model the problem as a graph: o Each node is a line of the cache. o Edge when there is no collision between nodes  A collision free group forms a clique o Group formation  Finding the cliques 10  To maximize the number of functional lines, we need to minimize the number of groups. o minimum clique cover (MCC).

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science AP Configuration Example G2(1) G2(2) G1(2) G2(S) G1(1) G2(3) G2(4) G1(S) G1(3) D First BankSecond Bank Island or Group 1 Island or Group 2 Disabled

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Operation Modes 12 High power mode ( AP is turned off)  There is no non-functional lines in this case  Clock gating to reduce dynamic power of SRAM structures Low power mode o During the boot time in low-power mode  BIST scans cache for potential faulty cells  Processor switches back to high power mode  Forms groups and configure the HW

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Evaluation Methodology  Performance [DEC Alpha 21364] o SimAlpha that is based on SimpleScalar OoO [SPEC2K]  Delay, power and area o Wattch and hot-leakage for power of processor o Artisan memory-compiler for our SRAM structures o CACTI for baseline on-chip caches (64KB, 2MB) o Synopsys design-compiler and power-compiler for  Miscellaneous logic (e.g. bypass MUXes and comparators)  Given set of cache parameters (e.g. V dd ) o Monte Carlo (with 1000 iterations) using our modified MCC o Determining disabled portion of caches (for 99% yield) 13

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Minimum Achievable V dd 14

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Overheads  Overheads for L1 and L2 caches o 10T used to protect the fault map, tag array, and memory map 15

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Performance Loss  One extra cycle latency for L1 and 2 cycles for L2 16

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Summary of Benefits 17 Larger leakage power savings for deeper technology nodes Larger leakage power savings for deeper technology nodes

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 18 Comparison with Alternative Methods Conventional Recently Proposed

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Conclusion  DVS is widely used to deal with high power dissipation o Minimum achievable voltage is bounded by SRAM structures  We proposed a highly flexible cache architecture o To tolerate failures when operating in near-threshold region  Using our approach o V dd of processor can be reduced to 375mV o 79% dynamic power saving and 51% leakage power saving o < 10% area overhead and performance overheads 19

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Thank You 20