Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu,

Slides:



Advertisements
Similar presentations
Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
Advertisements

Thank you for your introduction.
1 A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University.
LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.
Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.
Rethinking DRAM Design and Organization for Energy-Constrained Multi-Cores Aniruddha N. Udipi, Naveen Muralimanohar*, Niladrish Chatterjee, Rajeev Balasubramonian,
Exploiting Spatial Locality in Data Caches using Spatial Footprints Sanjeev Kumar, Princeton University Christopher Wilkerson, MRL, Intel.
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
Probabilistic Design Methodology to Improve Run- time Stability and Performance of STT-RAM Caches Xiuyuan Bi (1), Zhenyu Sun (1), Hai Li (1) and Wenqing.
IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
SAFER: Stuck-At-Fault Error Recovery for Memories Nak Hee Seong † Dong Hyuk Woo † Vijayalakshmi Srinivasan ‡ Jude A. Rivers ‡ Hsien-Hsin S. Lee † ‡†
Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.
Improving Cache Performance by Exploiting Read-Write Disparity
Mitigating the Performance Degradation due to Faults in Non-Architectural Structures Constantinos Kourouyiannis Veerle Desmet Nikolas Ladas Yiannakis Sazeides.
Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.
Power Savings in Embedded Processors through Decode Filter Cache Weiyu Tang, Rajesh Gupta, Alex Nicolau.
A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang, Vahid F., Lysecky R. Proceedings of Design, Automation and Test in Europe Conference.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.
Skewed Compressed Cache
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.
Interactions Between Compression and Prefetching in Chip Multiprocessors Alaa R. Alameldeen* David A. Wood Intel CorporationUniversity of Wisconsin-Madison.
Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.
Accuracy-Configurable Adder for Approximate Arithmetic Designs
Defining Anomalous Behavior for Phase Change Memory
Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
Reducing Refresh Power in Mobile Devices with Morphable ECC
Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj.
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
Post-Manufacturing ECC Customization Based on Orthogonal Latin Square Codes and Its Application to Ultra-Low Power Caches Rudrajit Datta and Nur A. Touba.
2010 IEEE ICECS - Athens, Greece, December1 Using Flash memories as SIMO channels for extending the lifetime of Solid-State Drives Maria Varsamou.
P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.
Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
Yun-Chung Yang SimTag: Exploiting Tag Bits Similarity to Improve the Reliability of the Data Caches Jesung Kim, Soontae Kim, Yebin Lee 2010 DATE(The Design,
Embedded System Lab. Daeyeon Son Neighbor-Cell Assisted Error Correction for MLC NAND Flash Memories Yu Cai 1, Gulay Yalcin 2, Onur Mutlu 1, Erich F. Haratsch.
Improving Cache Performance by Exploiting Read-Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez.
Increasing Cache Efficiency by Eliminating Noise Prateek Pujara & Aneesh Aggarwal {prateek,
Yun-Chung Yang TRB: Tag Replication Buffer for Enhancing the Reliability of the Cache Tag Array Shuai Wang; Jie Hu; Ziavras S.G; Dept. of Electr. & Comput.
Implicit-Storing and Redundant- Encoding-of-Attribute Information in Error-Correction-Codes Yiannakis Sazeides 1, Emre Ozer 2, Danny Kershaw 3, Panagiota.
Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,
1 CSCI 2510 Computer Organization Memory System II Cache In Action.
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
Architectural Vulnerability Factor (AVF) Computation for Address-Based Structures Arijit Biswas, Paul Racunas, Shubu Mukherjee FACT Group, DEG, Intel Joel.
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
1 Lecture 5: Scheduling and Reliability Topics: scheduling policies, handling DRAM errors.
Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:
1 Lecture 7: PCM Wrap-Up, Cache coherence Topics: handling PCM errors and writes, cache coherence intro.
PERGAMUM: REPLACING TAPE WITH ENERGY EFFICIENT, RELIABLE, DISK-BASED ARCHIVAL STORAGE M. W. Storer K. M. Greenan E. L. Miller UCSC K. Vorugant Network.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh
Adaptive Cache Partitioning on a Composite Core
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
Lecture 6: Reliability, PCM
Adapted from slides by Sally McKee Cornell University
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Use ECP, not ECC, for hard failures in resistive memories
A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,
Lei Zhao, Youtao Zhang, Jun Yang
Presentation transcript:

Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu, Chris Wilkerson, Shih-Lien Lu Intel Corporation Presenter : Gyu Seong, Kang

( 2 ) Contents  Background  Variable Strength ECC  Three types of VS-ECC  Cache characterization  Cache operation in low-voltage mode  ECC overhead  Simulation  Conclusion

( 3 ) Background  Energy efficiency is the main design concern  “Reducing supply voltage” – one of the most effective method  This approach restricted from process variation  The minimum operating voltage, 'Vccmin'  Failures in cache memory determine the minimum voltage

( 4 ) Background  The probability of multi-bit error is significantly lower than that of having zero or one in cache line. Probability of a single bit failure(pBitFail) and probability of e failures in 64B cache line

( 5 ) Background  Error correcting code(ECC)  Recover error using additional parity bits  The size of parity bits is proportional to correctable errors  With ECC, It can reduces operating voltage even lower, resulting in lower power consumption.  Selective high-strength protection of a few cache line  SECDED(single error correcting, double error detecting)  Multi-bit ECC for one or more error failures Probability of a sing set persistent failure in a 16-way cache with DECTED or VS-ECC

( 6 ) Variable strength ECC  Based on the number of failing bits  Different strengths for different cache lines  SECDED  Multi-bit ECC  Additional tag information to distinguish cache line class  3 types of VS-ECC is proposed  VS-ECC with a fixed number of regular and extended ECC  VS-ECC with line disable + fixed number of regular and extended ECC  VS-ECC with variable number of correction bit(1 to 4)

( 7 ) Variable strength ECC  VS-ECC with a fixed number of regular and extended ECC  Extended ECC bit  SECDED  set to 0  Multi-bit ECC  set to 1  Additional parity data is stored in Extended ECC array.

( 8 ) Variable strength ECC  VS-ECC with line disable + fixed number of regular and extended ECC  Extended ECC bit  SECDED  0  Multi-bit ECC  1  Save additional parity data in Extended ECC array  Disable bit  Disable cache line for more than two persistent failures in the low- voltage mode  Better soft error coverage

( 9 ) Variable strength ECC  VS-ECC with variable number of correction bit(1 to 4)  Number of ECC blocks  SECDED to 4EC5ED  Pointer to Extended ECC block  ECC data address for Extended ECC block

( 10 ) Variable strength ECC  Cache line need to be classified for low-voltage mode  4 eECC field / cache set  Only 4 lines can be active and contain protected data.  Rest of the cache is inactive and undertest.  Cache characterization  Reset All the E-bit and valid bits for inactive cache blocks.  Write back all the dirty data.  Reduce the processor voltage to the target Vccmin  Associate 4 eECC field & first 4 ways.  Deactivate rest of the ways.(Loss 75% of cache capacity)  Use multi-bit ECC en/decoder for R/W operation during characterization

( 11 ) Variable strength ECC  Memory test for inactive region  Use traditional memory test method  Write pre-defined pattern & read back  Under the test, if bit failure is detected  Multi bit failure  Set E-bit  Single bit failure  Write its location into lines' tag array  If single-bit failure again in the same line in the remainder test,  Compare its location with the one stored in the tag.  Hit – uses SECDED  Miss – Multi-bit failure, set E-bit  The test continues until 5 or more E-bit set to 1 or algorithm completes.

( 12 ) Variable strength ECC  Cache characterization for VS-ECC-Variable  Same characterization as VS-ECC-Fixed  Additional step to know the exact number of error bits  ¼ of the cache is tested at a time  Require higher testing accuracy  Cache characterization for VS-ECC-Disable  Same characterization as VS-ECC-Fixed  Function correctly with lower testing accuracy

( 13 ) Variable strength ECC – Operation Flow chart of cache operation in low voltage mode for VS-ECC-Fixed

( 14 ) ECC Overhead  Binary BCH Code  Parity bit for 64B(2 9, 512b) data  10bit = 1bit correction  Additional 1bit for detection  SECDED  10bit + 1bit  1 cycle latency for decoding  4EC5ED  40bit + 1bit  15 cycle latency for decoding

( 15 ) Simulation setup  Cycle-accurate, execution-driven IA32 simulator  OOO model based on Intel Core i7  2GHz  32 KB, 8-way set-associative icache, dcache  2MB, 16-way set-associative L2 cache  64byte cache line

( 16 ) Simulation setup  ECC configuration  Baseline – All SECDED  12 cycle L2 hit + 1 cycle for SECDED  Fixed-strength ECC  DECTED – additional 1 cycle for ECC  4EC5ED – additional 15 cycle for ECC  MS-ECC [Chishti et al., MICRO 2009]  4 bit error correction per segment(64bit)  1MB 8-way L2 cache with 1 cycle latency(Data : ECC = 1:1)  VS-ECC-Fixed – 12 x SECDED + 4 x 4EC5ED  VS-ECC-Disable – 12 x SECDED + 4 x 4EC5ED  VS-ECC-Variable – 16 x SECDED + 12 x 10bit ECC block

( 17 ) Simulation result – Reliability Failure probability as a function of supply voltage for different configurations

( 18 ) Normalized IPC for different benchmark Simulation result – Performance

( 19 ) Simulation result – Energy efficient Normalized Vccmin, Frequency, Power, and EPI Baseline Vccmin : 830mV f Baseline Frequency : 2000MHz

( 20 ) Conclusion  Low supply voltage condition  A few multi-bit failure in cache  While the majority of lines exhibit zero or one errors  Variable-strength error correcting codes  Selectively high-strength protection of a few cache line  Lines with no failure – SECDED for covering soft error  Persistent failure – 4EC5ED  Additional bit to support multi-bit ECC control  3 types of VS-ECC are proposed  VS-ECC-Fixed  VS-ECC-Variable  VS-ECC-Disable  VS-ECC can  Avoids significant decreases in cache capacity  Incurs minimal additional area overhead  VS-ECC-Disable even better previous published MS-ECC in terms of power & EPI

( 21 ) Q & A ?

( 22 ) Appendix  ECC Hardware overhead

( 23 ) Appendix  MS-ECC