Static Analysis to Mitigate Soft Error Failures in Processors

Slides:



Advertisements
Similar presentations
PhD Student: Carlos Arthur Lang Lisbôa Advisor: Luigi Carro VLSI-SoC PhD Forum Low overhead system level approaches to deal with multiple and long.
Advertisements

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Quantitative Analysis of Control Flow Checking Mechanisms for Soft Errors Aviral Shrivastava, Abhishek Rhisheekesan, Reiley Jeyapaul, and Carole-Jean Wu.
Modeling shared cache and bus in multi-core platforms for timing analysis Sudipta Chattopadhyay Abhik Roychoudhury Tulika Mitra.
LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.
1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.
CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.
A SOFTWARE-ONLY SOLUTION TO STACK DATA MANAGEMENT ON SYSTEMS WITH SCRATCH PAD MEMORY Arizona State University Arun Kannan 14 th October 2008 Compiler and.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
CML CML Cache Vulnerability Equations for Protecting Data in Embedded Processor Caches from Soft Errors † Aviral Shrivastava, € Jongeun Lee, † Reiley Jeyapaul.
Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.
TASK ADAPTATION IN REAL-TIME & EMBEDDED SYSTEMS FOR ENERGY & RELIABILITY TRADEOFFS Sathish Gopalakrishnan Department of Electrical & Computer Engineering.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible.
A Novel Cache Architecture with Enhanced Performance and Security Zhenghong Wang and Ruby B. Lee.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj.
Automated Design of Custom Architecture Tulika Mitra
Copyright © 2008 UCI ACES Laboratory Kyoungwoo Lee 1, Aviral Shrivastava 2, Nikil Dutt 1, and Nalini Venkatasubramanian 1.
SiLab presentation on Reliable Computing Combinational Logic Soft Error Analysis and Protection Ali Ahmadi May 2008.
CML CML Compiler-Managed Protection of Register Files for Energy-Efficient Soft Error Reduction Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
CML CML Compiler Optimization to Reduce Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture Lab Department of Computer.
Embedded System Lab. Daeyeon Son Neighbor-Cell Assisted Error Correction for MLC NAND Flash Memories Yu Cai 1, Gulay Yalcin 2, Onur Mutlu 1, Erich F. Haratsch.
Spring 2008 CSE 591 Compilers for Embedded Systems Aviral Shrivastava Department of Computer Science and Engineering Arizona State University.
CML Smart Cache Cleaning: Energy Efficient Vulnerability Reduction in Embedded Processors Reiley Jeyapaul, and Aviral Shrivastava Compiler Microarchitecture.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
Sunpyo Hong, Hyesoon Kim
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Static Analysis to Mitigate Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava Compiler Microarchitecture Lab Arizona State University, USA.
A Novel, Highly SEU Tolerant Digital Circuit Design Approach By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.
1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.
University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.
MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.
Fault-Tolerant Resynthesis for Dual-Output LUTs Roy Lee 1, Yu Hu 1, Rupak Majumdar 2, Lei He 1 and Minming Li 3 1 Electrical Engineering Dept., UCLA 2.
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
SE-Aware HPC Extension : Selective Data Protection for reducing failures due to soft errors 7/20/2006 Kyoungwoo Lee.
Chapter 9: Virtual Memory – Part I
CS 704 Advanced Computer Architecture
nZDC: A compiler technique for near-Zero silent Data Corruption
Chapter 9 – Real Memory Organization and Management
5.2 Eleven Advanced Optimizations of Cache Performance
Cache Memory Presentation I
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
UnSync: A Soft Error Resilient Redundant Multicore Architecture
MAPLD 2005 BOF-L Mitigation Methods for
Introduction to cosynthesis Rabi Mahapatra CSCE617
Hwisoo So. , Moslem Didehban#, Yohan Ko
Performance Optimization for Embedded Software
Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.
Adapted from slides by Sally McKee Cornell University
Partially Protected Caches to Reduce Failures Due to Soft Errors in Multimedia Applications Kyoungwoo Lee, Aviral Shrivastava, Ilya Issenin, Nikil Dutt,
Reiley Jeyapaul and Aviral Shrivastava Compiler-Microarchitecture Lab
1. Arizona State University, Tempe, USA
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
MICRO-50 Swamit Tannu Zachary Myers Douglas Carmean Prashant Nair
Virtual Memory Overcoming main memory size limitation
Contents Memory types & memory hierarchy Virtual memory (VM)
Memory System Performance Chapter 3
Code Transformation for TLB Power Reduction
Reiley Jeyapaul and Aviral Shrivastava Compiler-Microarchitecture Lab
Main Memory Background
A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

Static Analysis to Mitigate Soft Error Failures in Processors Master’s Thesis Presentation by Reiley Jeyapaul Advisory Committee: Dr. Aviral Shrivastava Dr. Lawrence Clark Dr. Yu Cao Compiler Microarchitecture Laboratory

Soft Errors Radiation induced transient faults – Soft Errors, result in random erroneous program states, causing system failure. Soft Errors, are a rapidly increasing menace to the dependability of laptops and handheld devices of tomorrow. Rapid reduction in device dimensions and growing circuit complexity will only make things worse. Documented soft error instances at sea level : SUN server crashes of Nov, 2000. CISCO 12000 series routers experience unexpected resets.

The Path To a Solution Circuit-level techniques TMR technique using a majority voter. Error masking using the I/O propagation delay of circuits. SEU hardened CMOS circuits Drawback : Area, power and implementation cost overhead Microarchitecture-level techniques Selective re-fetching and store-through caches Partially protected caches SEC-DED techniques Drawback: Requires modification of existing architecture Includes design and verification complexity Software - level techniques (SWIFT) Reclaiming unused instruction resources and Control flow check. SMT thread for redundancy based error detection and correction Performance overhead is involved because of additional resource usage. No compiler technique to reduce the impact of soft errors in caches has been proposed till date.

Soft Errors and the Cache Caches occupy more than 50% of the processor chip-area. 90% of the chip transistors are in caches. Low operating voltages of caches are required for improved performance. Low masking capabilities in SRAM cells Caches are most susceptible to radiation impact and directly translate to system failure Majority of overall soft errors occur in memories: Probability of multi-bit errors is greater in memories The high transistor density increases probability of neutron impact and secondary emissions. ECC techniques in L1 cache has a performance overhead owing to the small memory latency. Refs: 1) Graph : S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim. Robust system design with built-in soft-error resilience. Computer, 38(2):43–52, 2005. 2) Processor cache specifications from the information about the Intel Itanium II processor at 0.18um technology. Masking Timing window is very small, causing temporal masking nearly impossible. No depth in logic involved and therefore no logical masking Involves reinforcing inverters which are very critically designed for maximum performance therefore no electrical masking capabilities. No microarchitecture masking possible.

Measuring Soft Errors in Cache Vulnerability is a measure of the “susceptibility of data in the cache”. A datum is vulnerable in the cache only if, it will be read by the processor it will be committed to memory after a write operation (dirty data) A datum is not vulnerable if, it will be overwritten it will be evicted from cache, and not written back (when not dirty) Refs: 1) Graph : S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim. Robust system design with built-in soft-error resilience. Computer, 38(2):43–52, 2005. 2) Processor cache specifications from the information about the Intel Itanium II processor at 0.18um technology. Masking Timing window is very small, causing temporal masking nearly impossible. No depth in logic involved and therefore no logical masking Involves reinforcing inverters which are very critically designed for maximum performance therefore no electrical masking capabilities. No microarchitecture masking possible. CE CE R W R R R Time X X X WV RV

Motivation for Compiler Technique Performance trend irregular when compared to vulnerability variation. An optimal loop order exists, with reduced vulnerability and low runtime. 13X variation in vulnerability for less than 30% variation in runtime Such a “Performance – Vulnerability” tradeoff is required for an optimal robust application. At the compiler, such tradeoffs can be identified through static estimation of vulnerability and performance. Our principal motive : An efficient analytical methodology to evaluate vulnerability statically.

Outline Motivation Overview Vulnerability Estimation Reuse Vectors Vulnerability Modes Program Analysis Read vulnerability Write vulnerability Reuse Vectors Experiments Conclusion

Vulnerability Modes RRV ( Read Reuse Vulnerability) The time that the data is present in the cache before any read operation, it is vulnerable to data corruption in the cache. R I E WBV (Write Back Vulnerability) The time that data is present in the cache after the last write operation to the point of eviction, it is vulnerable. The data present in the cache before eviction is updated in the memory. WBV Can we know statically(without simulation), how long a data will remain in the cache ? W For Example, an array with a RW access to the data on each access. CE a1 a2 a3 a4 a5 . . . . . . . . . an Iterations WBV WBV RRV RRV

Modeling A Cache Access Data Space in the cache m n C(4,2) (0,0) for (i=0; i < N; i++) for (j=0; j < N; j++) for (k=0; k < N; k++) A[i][k] += B[i][j] * C[j][k] endFor Iteration Space: Every node is an iteration point of the loop j(0,1,0) CacheAddr(4,2) = Mapping of an array data to a cache location Array element accessed in any iteration is represented by the access function on the loop indices. Data Space x y C(4,2) (0,0) N C(4,2) iN(N,4,2) i(1,0,0) C(4,2) i2(1,4,2) i = N C(4,2) An iteration point is represented by the loop indices. i1(0,4,2) i = 1 k(0,0,1) (0,0,0)

Data Reuse and Cache Miss Data Space in the cache x y N C(4,2) (0,0) for (i=0; i < N; i++) for (j=0; j < N; j++) for (k=0; k < N; k++) A[i][k] += B[i][j] * C[j][k] endFor Cache Space: Every data element is directly mapped to a location in cache. The element of array C is evicted from the cache and replaced with one from array B. Iteration Space: Every node is an iteration of the loop B(0,7) X Reuse Vector : Direction of reuse of the data element at (i) is represented by (r = i-p) j(0,1,0) Data Space x y C(4,2) (0,0) N C(4,2) B(0,7) iN(N,4,2) i(1,0,0) C(4,2) Another iteration accesses data of array B, mapped to the same cache location causing a Cache Miss. p(0,7,4) B(0,7) i(1,4,2) (1,0,0) i = N C(4,2) p(0,4,2) i = 1 Cache miss iteration k(0,0,1) (0,0,0)

Read Reuse Vulnerability j(0,1,0) a1 a2 a3 a4 a5 an CE Iterations . . . . . . . . Read Vulnerability C(4,2) iN(N,4,2) Reuse Direction: Direction along which the data element is reused. i(1,0,0) Access Iterations: The iterations accessing the array element. i = N C(4,2) i0(0,4,2) k(0,0,1) Cache Miss Iterations: The iterations at which reuse vector is not realized. (0,0,0) Vulnerable Accesses (Cache Hits): The iterations at which the reuse is realized. Vulnerable Iterations (Read Reuse Vulnerability): The number of iterations between successive reuses.

Vulnerability Equations ( RRV ) Cache Miss Iterations on array R, is due to interference by any array accessed within the program. Vulnerability Calculation: Cache Hit Iterations, Vulnerability =

Cache-Interference Analysis Data Space in the cache x y N C(4,2) (0,0) for (i=0; i < N; i++) for (j=0; j < N; j++) for (k=0; k < N; k++) A[i][k] += B[i][j] * C[j][k] endFor Cache Space: Every data element is directly mapped to a location in cache. Iteration Space: Every node is an iteration of the loop B(0,7) X The element of array C is evicted from the cache and replaced with one from array B. Vulnerable Iterations: Iterations between the last write access, and point of eviction from the cache. j(0,1,0) Cache-Interference-Point(CIP) the iteration at which the data of array C is evicted from the cache. C(4,2) iN(N,4,2) i(1,0,0) B(0,7) C(4,2) p(0,7,4) VI i(1,4,2) (1,0,0) i = N The iteration at (i) accessing C(4,2) can’t reuse the data from iteration (p), and therefore experiences a cache miss along (r) C(4,2) Iteration accessing data of array B, mapped to the same cache location causing a Cache Miss. p(0,4,2) i = 1 k(0,0,1) (0,0,0)

Cache-Interference Point (CIP) v : iterations between i and any existing j point For every cache miss, there exist many possible interference points: { i, j } The cache line is evicted at the first interference point. Calculating first CIP: The set of Intermediate Iterations between a possible CIP and i : { v } This guarantees that all “v” points isolated, for a cache-miss iteration “i ”, are greater than the first cache-interference point “q”. y j4 j3 j2 q j1 VI x Vulnerable Iterations(VI) for i is given by,

Vulnerability Equations ( WV ) Determining Intermediate Iterations (II) Identifying the first CIP at which cache evictions occur. Isolating the Intermediate Iterations for every i due to array x: The set II for the array R : Vulnerability Calculation: Subtracting the II iterations from |r| iterations for every accessed iteration i, Vulnerability =

Outline Motivation Overview Vulnerability Estimation Reuse Vectors Types of Reuse Vectors Smallest Valid Reuse Vector Derived Reuse Vector Experiments Conclusion

Types of Reuse Vectors for (i=0; i < N; i++) for (j=0; j < N; j++) for (k=0; k < N; k++) A[i][k] += B[i][j] * C[j][k] endFor j(0,1,0) When a reference accesses a data element on the same cache line in different iterations, it is Spatial Reuse, denoted by rs C(4,2) iN(N,4,2) i(1,0,0) (0,0,1) i(1,4,2) (1,0,0) i = N C(4,2) When a reference accesses the same data on different iterations, it is Temporal Reuse, denoted by rt p(0,4,2) i = 1 C(4,2) k(0,0,1) (0,0,0) Multiple references to the same array with the same array index and distinguished by only the constant coefficient demonstrate a Group Reuse. For example C[j+3][k], C[j+5][k] forms a group temporal reuse along r(1,0,0). Only the smallest reuse vector guarantees a cache-interference at iteration i. However, not all reuse vectors are valid over all the Access Iterations of the array the smaller reuse vector cannot be identified globally for the entire iteration space.

Determining Smallest Valid Reuse Vector for (i=0; i < 16; i++) for (j=0; j < 16; j++) for (k=0; k < 16; k++) A[i][k] += B[i][j] * C[j][k] endFor Iteration Space of the loop, can be partitioned into domains, in which each reuse vector of the array is valid. k(0,0,1) j(0,1,0) i(1,0,0) (0,0,0) i = 15 k = 15 (15,15,15) K=8 k=1 Spatial Reuse: The first element of a memory-line does not have a preceding element in the same line. Spatial reuse vector is not valid for those data. k(0,0,1) j(0,1,0) i(1,0,0) (0,0,0) i = 15 k = 15 (15,15,15) i = 1 Temporal Reuse: First accesses on data elements do not have a preceeding iteration that accesses the same element. Temporal reuse vector is not valid for the first accesses on the array elements. Disjoint Domains are formed from the overlapping domains. The smallest reuse vector identified in each disjoint domain is used in the vulnerability equations for each disjoint domain formed.

Derived Reuse VectorS Derivation of Derived Reuse Vector for (i=0; i < N; i++) for (j=0; j < N; j++) for (k=0; k < N; k++) A[i][k] += B[i][j] * C[j][k] endFor j(0,1,0) There exists a reuse pattern between the last access on a cache line and the first access to the same cache line (on a subsequent iteration). C(4,2) iN(N,4,2) i(1,0,0) C(4,2) i(1,4,2) (1,0,-7) Derived Reuse Vector: The vector which defines this reuse pattern is Derived Reuse : rd = i – p. i = N C(4,9) i0(0,4,2) p(0,4,9) i = 1 C(4,2) k(0,0,1) (0,0,0) Derivation of Derived Reuse Vector The difference between temporal and spatial reuse vectors offset by the cache line size/loop bound, gives the Derived Reuse vector. If rt > rs , rl = rt – (CL-1).rs Where, CL = size of cache line. If rs > rt , rl = rs – (Nk-1).rt Nk = loop bound along k.

Outline Motivation Overview Cache Vulnerability Calculating Vulnerability Reuse Vectors Optimizing Vulnerability Equations Experiments Experimental setup Program Model Validation experiments Code transformation experiments Conclusion

Experiment Setup Benchmarks: Analytical Modelling Loop kernels from MiBench benchmark suite Compiled using –O3 option. Analytical Modelling Vulnerability equations were generated by hand Solving the vulnerability equations: Omega library (for solving vulnerability equations) Barvinok library (for enumerating the solved equations of closed form polyhedrons) Validated against simulation results for the same kernel. Simulation Environment Simulator: Simplescalar 3.0 toolset Architecture Configuration: 5 stage uni-processor model Direct mapped L1-cache in write-back mode

Program Model Only nested loops of the program are considered to estimate the vulnerability of the application. The loop characteristics: Perfectly nested loops with well defined loop bounds Array references in which access functions are affine relations of the loop indices. Multiple references to the same array should have the same indices. No conditional statements exist within the basic block. S.Gosh et al in their work have determined 72% of the loop kernels of SPECfp suite, satisfy the above restrictions. Vulnerability is calculated in iterations of the nested loop which has a nearly constant relation to the number of processor cycles.

Validation Experiments Loop kernels were validated for different cache sizes against simulation values of vulnerability.

Validation Experiments Validation of the vulnerability equations for different array placement configurations.

Application of Vulnerability Equations Impact of Loop Interchange The order of the loop indices accessing the data is varied across all combinations. Vulnerability reduction ( 14 X ) Performance tradeoff ( 25% ) Impact of Loop Fission/Fusion Independent instructions within the loop nest, are executed as separate loops. Increase in runtime (32 %) Reduced runtime during fusion ( -49%) Reduced vulnerability due to reduced reuse capabilities ( 18 X )

Application of Vulnerability Equations Impact of Array Interleaving Arrays accessed within the same nested loop are interleaved. Improved performance (41 %) Vulnerability tradeoff (1.5 X ) Impact of Relative Array Placement Multiples of cache-line distance is introduced between array memory locations: No defined variation pattern Extensive exploration required Analytically, an optimal placement can be determined efficiently

Conclusion A novel static analysis methodology has been proposed for the accurate evaluation of data cache vulnerability. Worst case time complexity for implementation of the analytical technique is polynomial time (comparable to existing compiler optimizations). The model has been validated through experiments on benchmark loops across code transformations. The application of the vulnerability model in optimizing for robustness and optimal performance, across various code transformations has been demonstrated. The total number of vulnerability equations generated = O(narr^2 x nreuse) Worst case time to calculate numerical values for the closed convex polyhedrons is polynomial time for

Future Work To incorporate versatility in the analytical model accommodating nested loops with more complex access functions. To model the vulnerability of data in cache architectures of arbitrary associativity. To model vulnerability for multi-core architectures.

Related Publication “Code Transformations for TLB Power Reduction”, Reiley Jeyapaul, Sandeep Marathe, Aviral Shrivastava [VLSI’09] Proposed compiler techniques to reduce page switches: page-switch aware instruction and operand reordering page-switch aware array interleaving page-switch aware loop unrolling Implemented the technique for the use-last TLB architecture design. The comprehensive page-switch reduction algorithm results in 39% reduction in the data-TLB page switching energy, with negligible variation in performance.

Thank you and God Bless !

Backup Slides

Application of Vulnerability Equations Vulnerability variation on Cache Configurations

The Path To a Solution Circuit-level techniques TMR technique using a majority voter. Nieuwland et al [IOLTS’06] Error masking using the I/O propagation delay of circuits. Krishnamohan et al [SOC’04] Area, power and implementation cost overhead Microarchitecture-level techniques Selective re-fetching and store-through caches Sridharan et al [IEEE Trans’06] Partially protected caches Shrivastava et al [CASES’06] Require modification of existing architecture Include design and verification complexity System- level techniques (SWIFT)Reclaiming unused resources during the execution. Reis et al [CGO’05] SMT thread for redundancy based error detection and correction Gomaa et al [SIGARCH’05] No compiler technique to reduce the impact of soft errors on applications has been proposed till date.