CML Smart Cache Cleaning: Energy Efficient Vulnerability Reduction in Embedded Processors Reiley Jeyapaul, and Aviral Shrivastava Compiler Microarchitecture.

Slides:

Advertisements

Similar presentations

Pooja ROY, Manmohan MANOHARAN, Weng Fai WONG National University of Singapore ESWEEK (CASES) October 2014 EnVM : Virtual Memory Design for New Memory Architectures.

Advertisements

NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for.

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

Thank you for your introduction.

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

Quantitative Analysis of Control Flow Checking Mechanisms for Soft Errors Aviral Shrivastava, Abhishek Rhisheekesan, Reiley Jeyapaul, and Carole-Jean Wu.

Lecture 12 Reduce Miss Penalty and Hit Time

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.

LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.

CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.

Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.

Microprocessor Reliability

A Mechanism for Online Diagnosis of Hard Faults in Microprocessors Fred A. Bower, Daniel J. Sorin, and Sule Ozev.

Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.

Probabilistic Design Methodology to Improve Run- time Stability and Performance of STT-RAM Caches Xiuyuan Bi (1), Zhenyu Sun (1), Hai Li (1) and Wenqing.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

CML CML Presented by: Aseem Gupta, UCI Deepa Kannan, Aviral Shrivastava, Sarvesh Bhardwaj, and Sarma Vrudhula Compiler and Microarchitecture Lab Department.

Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.

CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.

Chapter 12 Pipelining Strategies Performance Hazards.

Scheduling Reusable Instructions for Power Reduction J.S. Hu, N. Vijaykrishnan, S. Kim, M. Kandemir, and M.J. Irwin Proceedings of the Design, Automation.

CML CML Cache Vulnerability Equations for Protecting Data in Embedded Processor Caches from Soft Errors † Aviral Shrivastava, € Jongeun Lee, † Reiley Jeyapaul.

An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.

Case Study - SRAM & Caches

A Compiler-in-the-Loop (CIL) Framework to Explore Horizontally Partitioned Cache (HPC) Architectures Aviral Shrivastava*, Ilya Issenin, Nikil Dutt *Compiler.

Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible.

Sanghyun Park, §Aviral Shrivastava and Yunheung Paek

Copyright © 2008 UCI ACES Laboratory Kyoungwoo Lee 1, Aviral Shrivastava 2, Nikil Dutt 1, and Nalini Venkatasubramanian 1.

Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir.

Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu,

Presenter: Jyun-Yan Li Effective Software-Based Self-Test Strategies for On-Line Periodic Testing of Embedded Processors Antonis Paschalis Department of.

Soft errors in adder circuits Rajaraman Ramanarayanan, Mary Jane Irwin, Vijaykrishnan Narayanan, Yuan Xie Penn State University Kerry Bernstein IBM.

CML CML Compiler-Managed Protection of Register Files for Energy-Efficient Soft Error Reduction Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture.

ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.

1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.

Srihari Makineni & Ravi Iyer Communications Technology Lab

CML CML Compiler Optimization to Reduce Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture Lab Department of Computer.

CML REGISTER FILE ORGANIZATION FOR COARSE GRAINED RECONFIGURABLE ARCHITECTURES (CGRAs) Dipal Saluja Compiler Microarchitecture Lab, Arizona State University,

Low-Power Cache Organization Through Selective Tag Translation for Embedded Processors with Virtual Memory Support Xiangrong Zhou and Peter Petrov Proceedings.

Bypass Aware Instruction Scheduling for Register File Power Reduction Sanghyun Park, Aviral Shrivastava Nikil Dutt, Alex Nicolau Yunheung Paek Eugene Earlie.

CML SSDM: Smart Stack Data Management for Software Managed Multicores Jing Lu Ke Bai, and Aviral Shrivastava Compiler Microarchitecture Lab Arizona State.

Yun-Chung Yang TRB: Tag Replication Buffer for Enhancing the Reliability of the Cache Tag Array Shuai Wang; Jie Hu; Ziavras S.G; Dept. of Electr. & Comput.

CML Smart Compilers for Reliable and Power-efficient Embedded Computing Reiley Jeyapaul, PhD Candidate, SCIDSE, ASU Supervisory.

CPS3340 COMPUTER ARCHITECTURE Fall Semester, /3/2013 Lecture 9: Memory Unit Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.

11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,

Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.

1 System-Level Vulnerability Estimation for Data Caches.

Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.

Architectural Vulnerability Factor (AVF) Computation for Address-Based Structures Arijit Biswas, Paul Racunas, Shubu Mukherjee FACT Group, DEG, Intel Joel.

Harnessing Soft Computation for Low-Budget Fault Tolerance Daya S Khudia Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan,

Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.

ECE/CS 552: Main Memory and ECC © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and.

CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.

Static Analysis to Mitigate Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava Compiler Microarchitecture Lab Arizona State University, USA.

CML Branch Penalty Reduction by Software Branch Hinting Jing Lu Yooseong Kim, Aviral Shrivastava, and Chuan Huang Compiler Microarchitecture Lab Arizona.

Carnegie Mellon University, *Seagate Technology

IA-64 Architecture Muammer YÜZÜGÜLDÜ CMPE /12/2004.

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture

SE-Aware HPC Extension : Selective Data Protection for reducing failures due to soft errors 7/20/2006 Kyoungwoo Lee.

nZDC: A compiler technique for near-Zero silent Data Corruption

Improving Program Efficiency by Packing Instructions Into Registers

Computer Architecture & Operations I

UnSync: A Soft Error Resilient Redundant Multicore Architecture

Experiment Evaluation

Hwisoo So. , Moslem Didehban#, Yohan Ko

Module IV Memory Organization.

URECA: A Compiler Solution to Manage Unified Register File for CGRAs

Reiley Jeyapaul and Aviral Shrivastava Compiler-Microarchitecture Lab

Code Transformation for TLB Power Reduction

Reiley Jeyapaul and Aviral Shrivastava Compiler-Microarchitecture Lab

Presentation transcript:

CML Smart Cache Cleaning: Energy Efficient Vulnerability Reduction in Embedded Processors Reiley Jeyapaul, and Aviral Shrivastava Compiler Microarchitecture Lab, Arizona State University, Tempe, Arizona, USA

CML Web page: aviral.lab.asu.edu CML Scaling Drives Technology Advancement 2 Smaller device dimensions improve performance and reduce power consumption Processor device size rapidly shrinks every generation 45nm [2008]30nm [2010]20nm [2011] 15nm [2013*]10nm [2015*] *Expected

CML Web page: aviral.lab.asu.edu CML Reliability a consequence: Transient Faults induce Soft Errors 3 Transient Faults Electrical disturbances can disrupt the operation causing Transient Faults

CML Web page: aviral.lab.asu.edu CML 4 Soft Errors - an Increasing Concern with Technology Scaling Toyota Prius: SEUs blamed as the probable cause for unintended acceleration. Performance is useless if not correct ! Soft Errors  Charge carrying particles induce Soft Errors  Alpha particles  Neutrons  High energy (100KeV -1GeV)  Low energy (10meV – 1eV)  Soft Error Rate  Is now 1 per year  Exponentially increases with technology scaling  Projected  1 per day in a decade

CML Web page: aviral.lab.asu.edu CML Agenda 5  Why cache vulnerability?  Cache Cleaning to Improve Reliability  Smart Cache Cleaning Methodology  Experimental Evaluation and Results

CML Web page: aviral.lab.asu.edu CML Caches are most vulnerable 6  Caches occupy majority of chip-area  Much higher % of transistors  More than 80% of the transistors in Itanium 2 are in caches.  Low operating voltages  Frequent accesses  Small and tight SRAM cell layout  Majority contributor to the total soft errors in a system Cache (split I/D) = 32KB I-TLB = 48 entries D-TLB = 64 entries LSQ = 64 entries Register File = 32 entries Cache (split I/D) = 32KB I-TLB = 48 entries D-TLB = 64 entries LSQ = 64 entries Register File = 32 entries With cheap Error detection, cache still the most susceptible architecture block.

CML Web page: aviral.lab.asu.edu CML How to protect L1 Cache ? 7 FeaturesSECDED 1 Parity Error detection1 bit and 2 bit1 bit Error Correction1 bit No correction Cache Access Latency+95% increase (can be hidden) No Impact Cache Area Increase+22%+ <1% Cache Power Increase+22%+ <1% Enabled ProcessorsSPM of IBM CellARM, Intel Xscale, Intel Atom To Detect + Correct: Consequences render it impractical. Practical Method: Needs supporting method to correct errors. [1] L. Hung, H. Irie, M. Goshima, and S. Sakai. Utilization of SECDED for soft error and variation- induced defect tolerance in caches. In DATE ’07,

CML Web page: aviral.lab.asu.edu CML Cache Vulnerability  Assume: Parity based error detection to detect 1-bit errors.  Non-dirty data is not vulnerable  Can always re-read non-dirty data from lower level of memory correct soft errors  Parity based error detection can correct soft errors on non-dirty data  Dirty data cannot be reloaded (recovered) from errors. vulnerable  Data in the cache is vulnerable if  It will be read by the processor, or it will be committed to memory  AND it is dirty 8 R W RRR CE Time W How to protect dirty L1 cache data ?

CML Web page: aviral.lab.asu.edu CML Agenda 9  Why cache vulnerability?  Cache Cleaning to Improve Reliability  Write-through cache  Early Write-back cache  Proposed Smart Cache Cleaning  Smart Cache Cleaning Methodology  Experimental Evaluation and Results

CML Web page: aviral.lab.asu.edu CML Possible Solution 1: Write-Through Cache A copy of cache-data is written into the memory NO dirty data in cache NO vulnerability HIGH L1-M traffic If error detected on subsequent access, can reload from memory to recover. Error Recovery: Data reloaded from memory RWRWRWRW E RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW A[1] ProgramTimeline(cycles) MemoryWrite-back or Cache Cleaning for(i:1~3){ for(j:1~3){ A[i]+=B[j] } A[2]A[3] End of Loop A[1] A[2] A[3] Data Accessed 10 Vulnerability = 0 # write-backs = 9

CML Web page: aviral.lab.asu.edu CML Possible Solution 2: Early Write-back Cache Hardware-only cleaning has no knowledge of the program’s data access pattern. RWRWRWRW E RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW A[1] ProgramTimeline(cycles) PeriodicWrite-back for(i:1~3){ for(j:1~3){ A[i]+=B[j] } A[2]A[3] End of Loop A[1] A[2] A[3] Data Accessed Vulnerability A[1] A[2] A[3] A[1] A[2] A[3] Unnecessary cleaning while data is being reused 4 Cycles Data unused but vulnerable 11 Vulnerability = 48 # write-backs = 0 Vulnerability = 13 # write-backs = 8 Vulnerability ≠ 0 What went wrong? L. Li, V. Degalahal, N. Vijaykrishnan, M. Kandemir, and M. Irwin. Soft error and energy consumption interactions: a data cache perspective. In ISLPED ’04.

CML Web page: aviral.lab.asu.edu CML Proposed Solution: Smart Cache Cleaning RWRWRWRW E RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW A[1] ProgramTimeline(cycles) SmartCacheCleaning for(i:1~3){ for(j:1~3){ A[i]+=B[j] } A[2]A[3] End of Loop A[1] A[2] A[3] Data Accessed A[1] A[2] A[3]Vulnerability Vulnerability = 0 for unused data. Data is vulnerable while being reused by the program For this program, Clean data, ONLY when not in use by the program. 12 Vulnerability = 18 # write-backs = 3 Smart program analysis can help perform Cache Cleaning only when required.

CML Web page: aviral.lab.asu.edu CML Agenda 13  Why cache vulnerability?  Cache Cleaning to Improve Reliability  Smart Cache Cleaning Methodology  When to clean data ?  SCC Hardware Architecture  How to clean data ?  Which data to clean ?  Experimental Evaluation and Results

CML Web page: aviral.lab.asu.edu CML How to do Smart Cache Cleaning ? SCC Insn Addr Which data to clean ? IFIFIDID EXEX MMWBWB L1 Cache R/W Cache Accesses MemoryMemory MemoryWrite-backs LSQLSQ SCC Pattern When to clean ? Controller: Issue clean signal when required Store Insn Addr Targeted cache cleaning architecture clean CacheCleaning How to clean ? ProgramProgram SCC Analysis Memory Profile data 14

CML Web page: aviral.lab.asu.edu CML When to clean data ? RWRWRWRW E RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW A[1] ProgramTimeline(cycles) InstantaneousVulnerability (per access) for(i:1~3){ for(j:1~3){ A[i]+=B[j] } A[2]A[3] End of Loop A[1] A[2] A[3] Data Accessed 3 Instantaneous Vulnerability of access SCC_Threshold If Instantaneous Vulnerability of access > SCC_Threshold Execute: store + clean  assign 1 to SCC_Pattern Else Execute: store only  assign 0 to SCC_Pattern A[1] 3 19 Execute: store + clean If end of loop execution is not end of program, then instantaneous vulnerability of last access extends till subsequent cache eviction. 0 SCC_Pattern SCC_Threshold = 4

CML Web page: aviral.lab.asu.edu CML How to do Smart Cache Cleaning SCC Insn Addr Which data to clean ? IFIFIDID EXEX MMWBWB L1 Cache R/W Cache Accesses MemoryMemory MemoryWrite-backs LSQLSQ SCC Pattern When to clean ? Controller: Issue clean signal when required Store Insn Addr Targeted cache cleaning architecture clean CacheCleaning How to clean ? ProgramProgram SCC Analysis Memory Profile data 16

CML Web page: aviral.lab.asu.edu CML How to clean data ? RWRWRWRW E RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW RWRWRWRW A[1] ProgramTimeline(cycles) for(i:1~3){ for(j:1~3){ A[i]+=B[j] } A[2]A[3] End of Loop A[1] A[2] A[3] SCC Pattern Program Execution Instruction Pipeline L1 Cache MemoryMemory LSQLSQ ControllerController Targeted cache cleaning architecture clean CacheCleaning SCC_Pattern Cycle count : No Cleaning 17

CML Web page: aviral.lab.asu.edu CML SCC Achieves Energy-efficient Vulnerability Reduction 18 Hardware-only cache cleaning Hardware-only cache cleaning trades-off energy for vulnerability Smart Cache Cleaning Smart Cache Cleaning can achieve ≈ 0 Vulnerability ≈ 0 Vulnerability, at ≈ 0 Energy cost

CML Web page: aviral.lab.asu.edu CML SCC_Pattern Generation: Weighted k -bit Compression SCC Cleaning sequence: K = 8 SCC Pattern: Sliding window of 8 bits Bit count in position 0 Num of 1s = 3 Num of 0s = 1 Cost for placing 0 in pos [0] of SCC Pattern: cost_of_0 = Num of 1s X 1 = 3 X 1 = 3 Cost of not cleaning clean when required To determine matching bit value for position 0 Cost of cleaning when not required. Choose bit value = 1, iff # of 1s > 2X # of 0s Choose bit value = 1, iff # of 1s > 2X # of 0s if ( cost_of_1 ≤ cost_of_0 ) Bit value [0] = 1 if ( cost_of_1 ≤ cost_of_0 ) Bit value [0] = 1 19 Cost for placing 1 in pos 0 of SCC Pattern: cost_of_1 = Num of 0s X 2 = 1 X 2 = 2

CML Web page: aviral.lab.asu.edu CML SCC_Pattern Generation: Weighted k -bit Compression SCC Cleaning sequence: K = 8 SCC Pattern: Remaining 6 bits are 0-padded Position [1] : cost_of_1[1] = 2 cost_of_0[1] = 3 if ( cost_of_1[i] ≤ cost_of_0[i] ) Bit value [i] = 1 else Bit value [i] = 0 if ( cost_of_1[i] ≤ cost_of_0[i] ) Bit value [i] = 1 else Bit value [i] = Position [2] : cost_of_1[2] = 2 cost_of_0[2] = Position [4] : cost_of_1[4] = 6 cost_of_0[4] = Greater # of 1s Greater # of 0s Position [6] : cost_of_1[6] = 4 cost_of_0[6] = 2 Equal # of 0s and 1s All 0s  Bit value =

CML Web page: aviral.lab.asu.edu CML Accuracy of the Weighted Pattern-Matching Algorithm Weights used in the algorithm define the accuracy. Size of k affects accuracy 21

CML Web page: aviral.lab.asu.edu CML How to do Smart Cache Cleaning SCC Insn Addr Which data to clean ? IFIFIDID EXEX MMWBWB L1 Cache R/W Cache Accesses MemoryMemory MemoryWrite-backs LSQLSQ SCC Pattern When to clean ? Controller: Issue clean signal when required Store Insn Addr Targeted cache cleaning architecture clean CacheCleaning How to clean ? ProgramProgram SCC Analysis Memory Profile data 22

CML Web page: aviral.lab.asu.edu CML Which data to clean ? Overlapping accesses: Choosing B, precludes the choice of A Average Vulnerability per access Instantaneous Vulnerability(IV) by each access of reference A A1 10 A2 20 ParametersRef ARef B Vulnerability Access # B1 20 How to choose one over another ? Profit (V/A) SCC InsnAddr One SCC InsnAddr Register 23

CML Web page: aviral.lab.asu.edu CML Energy Efficient Vulnerability Reduction with SCC 24

CML Web page: aviral.lab.asu.edu CML SCC: Better results with more hardware registers SCC registers With more SCC registers, vulnerability is reduced further, at the cost of hardware overhead 25

CML Web page: aviral.lab.asu.edu CML Summary 26  We develop a Hybrid Compiler & Micro-architecture technique for Reliability – SCC  Soft Errors are a major concern, and Caches are most vulnerable to transient errors by radiation particles  Cache Cleaning can reduce vulnerability, at the possible cost of power overhead  ECC gains 0 vulnerability, but 70X power overhead  EWB gains 47% vulnerability reduction, with 6X power overhead  Our Smart Cache Cleaning technique:  performs Cleaning on the right cache blocks at the right time  achieves energy-efficient reliability in embedded systems

CML Web page: aviral.lab.asu.edu CML Future Work  SCC-hardware overhead can be eliminated through compiler-based instrumentation and loop unrolling.  Compile-time SCC analysis, and instrumentation can be performed using Cache Vulnerability Equations [LCTES’10].  Pure software-only SCC solution.  NO hardware overhead  By introducing methods to accurately calibrate the weights used in the algorithm, accuracy of k-bit pattern matching algorithm can be improved. 27

CML Web page: aviral.lab.asu.edu 28 Home Page : CML Lab :