Mitigating the Performance Degradation due to Faults in Non-Architectural Structures Constantinos Kourouyiannis Veerle Desmet Nikolas Ladas Yiannakis Sazeides.

Slides:

Advertisements

Similar presentations

NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for.

Advertisements

Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *

Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.

4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.

UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.

Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical.

Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.

Wrong Path Events and Their Application to Early Misprediction Detection and Recovery David N. Armstrong Hyesoon Kim Onur Mutlu Yale N. Patt University.

Reap What You Sow: Spare Cells for Post-Silicon Metal Fix Kai-hui Chang, Igor L. Markov and Valeria Bertacco ISPD’08, Pages

IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of.

Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

1 A Real Problem  What if you wanted to run a program that needs more memory than you have?

A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.

From Sequences of Dependent Instructions to Functions An Approach for Improving Performance without ILP or Speculation Ben Rudzyn.

Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC L3 Similar.

1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.

Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.

1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.

Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.

Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.

Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)

1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.

Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.

Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.

Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations ‡ Computer Science and Engineering, UC San Diego variability.org.

Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.

Defining Anomalous Behavior for Phase Change Memory

CPU Cache Prefetching Timing Evaluations of Hardware Implementation Ravikiran Channagire & Ramandeep Buttar ECE7995 : Presentation.

Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.

Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.

Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.

Analysis of Branch Predictors

Dept. of Computer and Information Sciences : University of Delaware John Cavazos Department of Computer and Information Sciences University of Delaware.

Operating Systems COMP 4850/CISG 5550 Page Tables TLBs Inverted Page Tables Dr. James Money.

A Decompression Architecture for Low Power Embedded Systems Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, Proceedings International.

Database Architecture Optimized for the new Bottleneck: Memory Access Chau Man Hau Wong Suet Fai.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.

Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.

Relyzer: Exploiting Application-level Fault Equivalence to Analyze Application Resiliency to Transient Faults Siva Hari 1, Sarita Adve 1, Helia Naeimi.

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.

Implicit-Storing and Redundant- Encoding-of-Attribute Information in Error-Correction-Codes Yiannakis Sazeides 1, Emre Ozer 2, Danny Kershaw 3, Panagiota.

Computer Organization CS224 Fall 2012 Lessons 45 & 46.

Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,

Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)

Performance Implications of Faults in Prediction Arrays Nikolas Ladas Yiannakis Sazeides Veerle Desmet University of Cyprus Ghent University DFR’ 10 Pisa,

1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.

CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.

Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.

Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.

1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.

Dynamic Branch Prediction During Context Switches Jonathan Creekmore Nicolas Spiegelberg T NT.

Sunpyo Hong, Hyesoon Kim

On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.

1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical.

Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture

Memory COMPUTER ARCHITECTURE

Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,

ISPASS th April Santa Rosa, California

The University of Adelaide, School of Computer Science

Hardware Multithreading

Address-Value Delta (AVD) Prediction

Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt

CSC3050 – Computer Architecture

Lois Orosa, Rodolfo Azevedo and Onur Mutlu

Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.

Virtual Memory 1 1.

Presentation transcript:

Mitigating the Performance Degradation due to Faults in Non-Architectural Structures Constantinos Kourouyiannis Veerle Desmet Nikolas Ladas Yiannakis Sazeides University of Cyprus Ghent University 6 th HiPEAC Industrial Workshop Paris, 26/11/2008

6 th HiPEAC Industrial Workshop 2 Motivation  Technology scaling: Opportunities and Challenges  Reliability and computing tomorrow Failures will not be exceptional Various sources of failures  soft-errors, process-variation, wear-out, hardware and software bugs  Key challenge: provide correct operation with little or no performance degradation in the presence of faults with low- cost solutions

6 th HiPEAC Industrial Workshop 3 Architectural vs Non-Architectural Faults  So far research mainly focused on correctness  Emphasis architectural structures, e.g. caches, registers, buses  However, faults can occur in non-architectural structures, e.g. predictor and replacement arrays  Faults in non-architectural structures may degrade performance

6 th HiPEAC Industrial Workshop 4 Non-Architectural Faults: Why care?  Miss deadlines: unacceptable for real time applications  Non-architectural resources cover significant fraction of the active area of modern cores where temperature is higher more susceptible to wear-out and process variation faults  If architectural resources protected, with increasing fault frequency/chip eventually non-architectural resources will become a performance bottleneck

6 th HiPEAC Industrial Workshop 5 This talk…  Quantifies performance implications of faults in a non- architectural array-structure, specifically a line predictor  Introduces and evaluates a simple detection scheme and repair technique to protect it against faults

6 th HiPEAC Industrial Workshop 6 Outline  Fault Modeling Arrays background  Performance Implications of Faults in a line predictor  Detection - Repair Mechanisms  Results  Conclusions and Future Direction  Work in progress…

6 th HiPEAC Industrial Workshop 7 Array Fault Modeling Key Parameters  Number of faults with increasing faults higher potential for performance degradation  Location of Faults frequently accessed entries more critical, output bit more serious  Fault Clustering Granularity/“radius” of faults  Model for each fault e.g. cell stuck-at-1 more critical if bits stored in the cell are biased towards zero

6 th HiPEAC Industrial Workshop 8 Non-architectural Resources  Arrays line predictor branch direction predictor return-address-stack indirect jump predictor memory dependence prediction replacement arrays (various caches)...  Non-Arrays branch target address adder memory prefetch adder....

6 th HiPEAC Industrial Workshop 9 Worst-case performance (cell faults) up to 27%

6 th HiPEAC Industrial Workshop 10 Worst-case - Hit rate

6 th HiPEAC Industrial Workshop 11 Detection and Repair  Possible to consider previously proposed techniques for architectural arrays  BUT detection and correction for non-architectural arrays does not have to be exact and provide full repair.  Sufficient to minimize the performance effects of faults  Our proposition: Address Remapping  Exploit non-uniformity of accesses Observed experimentally that few entries in the line-predictor are accessed. So, the remapping has a wide range of entries to go.

6 th HiPEAC Industrial Workshop 12 (Sorted) Access Distributions for LP

6 th HiPEAC Industrial Workshop 13 accessed cells accessed defective cells not accessed cells not accessed defective cells Original Access-Fault MapRotate accesses down by 1 row 1 instead of 3 accessed faulty cells Proposed Approach for Remapping

6 th HiPEAC Industrial Workshop 14 accessed cells accessed defective cells not accessed cells not accessed defective cells Original Access-Fault Map Remap row accesses 1 instead of 3 accessed faulty cells Proposed Approach (for cell faults)

6 th HiPEAC Industrial Workshop 15 Detection and Repair Scheme

6 th HiPEAC Industrial Workshop 16 Index Remapping Unit original index XOR 1 value decided from search engine remapped index

6 th HiPEAC Industrial Workshop 17 Remapping Search Engine Access mapFault map

6 th HiPEAC Industrial Workshop 18 Remapping Search Engine Access mapFault map Defective_accessed A =Σ i (Access map i * Fault map) = =143

6 th HiPEAC Industrial Workshop 19 Remapping Search Engine Remapped AccessesFault map Best remapping = XOR 1(fewer defective accessed entries) Defective_accessed Β = Σ i (Access map i * Fault map) = 20+50=70

6 th HiPEAC Industrial Workshop 20 Simulator  sim-alpha simulator  EV6 processor with 15 stage pipeline  Baseline configuration: No hard-fault, no remapping  SPEC CPU 2000 benchmarks – 100 M instructions Representative regions  We compare performance without and with remapping for random fault maps

6 th HiPEAC Industrial Workshop 21 Random results without and with remapping

6 th HiPEAC Industrial Workshop 22 Summary-Conclusions  Reliability should not be limited on correctness but also consider performance  Faults in non-architectural resources can degrade the performance of a processor and this may make them important to deal with  Proposed framework for detection and repair: Detects the case where there we have many defective accessed entries Finds the best possible remapping Applies the remapping  Remapping works very well in almost all cases

6 th HiPEAC Industrial Workshop 23 Future Work  Experiments with other non-architectural structures, such as direction and indirect predictors and replacament arrays for I- cache, D-cache, TLB.  Applicability of ideas to architectural structures.

6 th HiPEAC Industrial Workshop 24 Acknowledgments  Elli Demetriou and Costas Vrionis  Funding: University of Cyprus, Ghent University, SARC, HiPEAC, Intel

6 th HiPEAC Industrial Workshop 25 Thanks!

6 th HiPEAC Industrial Workshop 26 BACKUP SLIDES

6 th HiPEAC Industrial Workshop 27 Processor Pipeline 27

6 th HiPEAC Industrial Workshop 28 Line predictor structure 28

6 th HiPEAC Industrial Workshop 29 Remapping Issues  Remapping overhead: time to find the best remapping has a penalty on performance, but this is acceptable because Remapping is performed every 100 K intervals Once the best remapping is found, the problem will be solved and there will be no need to remap again  Design Space Remapping function: XOR  Due to the fact that remapping is in the critical path, we use a simple remapping function to minimize the overhead in hardware

6 th HiPEAC Industrial Workshop 30 Methodology: Performance Implications of Faults  Determine performance implications of faults in the LP and RAS for different scenarios  Worst-case Faults were injected on the most frequently used entries Most-used entry: provided most correct predictions for execution without faults  Average Impossible to do experimentally too many combinations Random : faults are injected at random entries

6 th HiPEAC Industrial Workshop 31 Random results without and with remapping

6 th HiPEAC Industrial Workshop 32 Faults and Arrays  Faults may occur in different parts of an array  Not practical to study faults at physical level

6 th HiPEAC Industrial Workshop 33 Functional Faults and Array Logical View  Abstractions that ease study of faults  Fault locations: cell, input address, output data