University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.

Slides:



Advertisements
Similar presentations
Processes and Threads Chapter 3 and 4 Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community College,
Advertisements

Chapter 3 Process Description and Control
NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for.
IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.
Idempotent Code Generation: Implementation, Analysis, and Evaluation Marc de Kruijf ( ) Karthikeyan Sankaralingam CGO 2013, Shenzhen.
Quantitative Analysis of Control Flow Checking Mechanisms for Soft Errors Aviral Shrivastava, Abhishek Rhisheekesan, Reiley Jeyapaul, and Carole-Jean Wu.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
SW-Based Fault Detection Mechanisms in Microprocessor Control Flow Execution.
This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/ ] under.
Using Hardware Vulnerability Factors to Enhance AVF Analysis Vilas Sridharan RAS Architecture and Strategy AMD, Inc. International Symposium on Computer.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Shoestring: Probabilistic.
CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.
Transient Fault Tolerance via Dynamic Process-Level Redundancy Alex Shye, Vijay Janapa Reddi, Tipp Moseley and Daniel A. Connors University of Colorado.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Incremental Path Profiling Kevin Bierhoff and Laura Hiatt Path ProfilingIncremental ApproachExperimental Results Path profiling counts how often each path.
Multiscalar processors
Fehlererkennung in SW David Rigler. Overview Types of errors detection Fault/Error classification Description of certain SW error detection techniques.
Cost-Efficient Soft Error Protection for Embedded Microprocessors
Testing an individual module
University of Michigan Electrical Engineering and Computer Science 1 Online Timing Analysis for Wearout Detection Jason Blome, Shuguang Feng, Shantanu.
MSWAT: Low-Cost Hardware Fault Detection and Diagnosis for Multicore Systems Siva Kumar Sastry Hari, Man-Lap (Alex) Li, Pradeep Ramachandran, Byn Choi,
GPU-Qin: A Methodology For Evaluating Error Resilience of GPGPU Applications Bo Fang , Karthik Pattabiraman, Matei Ripeanu, The University of British.
Evaluating the Error Resilience of Parallel Programs Bo Fang, Karthik Pattabiraman, Matei Ripeanu, The University of British Columbia Sudhanva Gurumurthi.
TASK ADAPTATION IN REAL-TIME & EMBEDDED SYSTEMS FOR ENERGY & RELIABILITY TRADEOFFS Sathish Gopalakrishnan Department of Electrical & Computer Engineering.
Distributed Control of FACTS Devices Using a Transportation Model Bruce McMillin Computer Science Mariesa Crow Electrical and Computer Engineering University.
Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj.
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir.
SiLab presentation on Reliable Computing Combinational Logic Soft Error Analysis and Protection Ali Ahmadi May 2008.
CML CML Compiler-Managed Protection of Register Files for Energy-Efficient Soft Error Reduction Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture.
Understanding the Propagation of Hard Errors to Software and Implications for Resilient System Design M. Li, P. Ramachandra, S.K. Sahoo, S.V. Adve, V.S.
Eliminating Silent Data Corruptions caused by Soft-Errors Siva Hari, Sarita Adve, Helia Naeimi, Pradeep Ramachandran, University of Illinois at Urbana-Champaign,
Architectural Optimizations Ed Carlisle. DARA: A LOW-COST RELIABLE ARCHITECTURE BASED ON UNHARDENED DEVICES AND ITS CASE STUDY OF RADIATION STRESS TEST.
European Test Symposium, May 28, 2008 Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI Kundan.
CML CML Compiler Optimization to Reduce Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture Lab Department of Computer.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Encore: Low-Cost,
(C) 2003 Daniel SorinDuke Architecture Dynamic Verification of End-to-End Multiprocessor Invariants Daniel J. Sorin 1, Mark D. Hill 2, David A. Wood 2.
Title of Selected Paper: IMPRES: Integrated Monitoring for Processor Reliability and Security Authors: Roshan G. Ragel and Sri Parameswaran Presented by:
Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn.
Relyzer: Exploiting Application-level Fault Equivalence to Analyze Application Resiliency to Transient Faults Siva Hari 1, Sarita Adve 1, Helia Naeimi.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Bundled Execution.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Detecting Errors Using Multi-Cycle Invariance Information Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence,
Using Loop Invariants to Detect Transient Faults in the Data Caches Seung Woo Son, Sri Hari Krishna Narayanan and Mahmut T. Kandemir Microsystems Design.
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
Harnessing Soft Computation for Low-Budget Fault Tolerance Daya S Khudia Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan,
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.
Low-cost Program-level Detectors for Reducing Silent Data Corruptions Siva Hari †, Sarita Adve †, and Helia Naeimi ‡ † University of Illinois at Urbana-Champaign,
EnerJ: Approximate Data Types for Safe and General Low-Power Computation (PLDI’2011) Adrian Sampson, Werner Dietl, Emily Fortuna Danushen Gnanapragasam,
Sunpyo Hong, Hyesoon Kim
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
Static Analysis to Mitigate Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava Compiler Microarchitecture Lab Arizona State University, USA.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Efficient Soft Error.
GangES: Gang Error Simulation for Hardware Resiliency Evaluation Siva Hari 1, Radha Venkatagiri 2, Sarita Adve 2, Helia Naeimi 3 1 NVIDIA Research, 2 University.
Loops Simone Campanoni
1 Compiler Managed Dynamic Instruction Placement In A Low-Power Code Cache Rajiv Ravindran, Pracheeti Nagarkar, Ganesh Dasika, Robert Senger, Eric Marsman,
Remix: On-demand Live Randomization
Soft-Error Detection through Software Fault-Tolerance Techniques
nZDC: A compiler technique for near-Zero silent Data Corruption
Optimization Code Optimization ©SoftMoore Consulting.
Improving Program Efficiency by Packing Instructions Into Registers
Daya S Khudia, Griffin Wright and Scott Mahlke
Hwisoo So. , Moslem Didehban#, Yohan Ko
Soft Error Detection for Iterative Applications Using Offline Training
NEMESIS: A Software Approach for Computing in Presence of Soft Errors
by Xiang Mao and Qin Chen
Fault Tolerant Systems in a Space Environment
rePLay: A Hardware Framework for Dynamic Optimization
Software Techniques for Soft Error Resilience
Presentation transcript:

University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and Scott Mahlke University of Michigan

Electrical Engineering and Computer Science Soft Errors Soft errors, also called single-event upsets(SEUs) –Occur because of High energy particle strikes or electrical noise Parameters affecting soft error rates –Shrinking dimensions, Voltage scaling 100 times increase from 180nm to 16nm (Borkar, Micro’05). One failure per day every chip at 16nm (Feng et al, ASPLOS’10) Image credit: Certichip 2

University of Michigan Electrical Engineering and Computer Science  Redundant execution in a single-threaded context  Compiler interleaves original and redundant instructions  Our target is a low-overhead control flow protection solution  Comparable coverage  Software-based control flow protection  Usually by embedding signatures/assertions in basic blocks  Combine duplication and symptoms  Improved by using profiling  Traditional dual/triple – modular redundancy  Mission-critical reliability 3 Soft Error Detection DMR, TMR Signature/assertion based (CFCSS, ACFC) Target Solution Increasing Overhead Data flow Control flow DMR, TMR Instruction duplication (SWIFT, EDDI) Instruction duplication + hardware symptoms (Shoestring, profileBased) ~ % ~30-70% ~10-30%

University of Michigan Electrical Engineering and Computer Science 4 Why Control Flow Errors? More than 70% of the transient faults lead to control flow errors (Vahdatpour et al.) Faults in hardware components manifest as control flow errors Program counter Address circuitry Errors in branch targets are 2.5x more likely to result in incorrect executions

University of Michigan Electrical Engineering and Computer Science 5 Outline Background Software-based control flow checking Abstract Control Signatures (ACS) Experimental evaluation Conclusions

University of Michigan Electrical Engineering and Computer Science 6 Control Flow Checking update sig var BB1 check sig var BB2 update sig var check sig var Two steps for control flow checking Compute signature at runtime Compare with an expected correct value In case of illegal control flow transfer, the signature check fails

University of Michigan Electrical Engineering and Computer Science 7 Signature-Based Control Flow Checking G = G xor d 1 BB1 s1s1 G = = s 1 ? s2s2 BB2 Software-based control flow checking Update signature in each basic block Check signature in each basic block Can only handle errors in branch targets Errors in branch directions (conditions) are not covered G = G xor d 2 G = = s 2 ? G = G xor d 2 G = s 1 xor s 1 xor s 2 G = s 2 d 1 = d 2 = s 1 xor s 2

University of Michigan Electrical Engineering and Computer Science 8 Signature-Based Control Flow Checking s 1 d 1 = G = = s 2 ? G = = s 1 ? s 2 d 2 = s 1 xor s 2 s 3 d 3 = s - xor s 3 G = G xor d 3 D 1 = s 1 xor s 3 G = = s 3 ? G = G xor d 2 G = G xor D 1 For branch fan-in nodes Extra updates Dynamically adjusting signature are required BB1 BB2 BB3 G = G xor d 1 G = G xor D 2 D 1 = 0

University of Michigan Electrical Engineering and Computer Science G = = s 2 ? G = = s 1 ? G = G xor d 3 D 1 = s 1 xor s 3 G = = s 3 ? G = G xor d 2 G = G xor D 1 BB1 BB2 BB3 G = G xor d 1 G = G xor D 2 D 1 = 0 Form regions Abstract away the details of control flow inside a region 9 Abstract Control Signatures G = G xor d 4 D 2 = s 2 xor s 6 G = = s 4 ? BB4 G = G xor d 5 D 3 = s 4 xor s 7 G = = s 5 ? BB5 Sources of overhead Signature updates Signature checks

University of Michigan Electrical Engineering and Computer Science G = G xor d 3 D 3 = s 4 xor s 7 10 Abstract Control Signatures G = = s 2 ? G = = s 1 ? G = G xor d 3 D 1 = s 1 xor s 3 G = = s 3 ? G = G xor d 2 G = G xor D 1 BB1 BB2 BB3 G = G xor d 1 G = G xor D 2 D 1 = 0 Sig update G = G xor d 4 D 2 = s 2 xor s 6 G = = s 4 ? BB4 G = = s 5 ? BB5 Sig update Optimize signature updates check simple run-time properties Sources of overhead Signature updates Signature checks Optimize checks Insert checks at region boundaries Form regions Abstract away the details of control flow inside a region Sig check

University of Michigan Electrical Engineering and Computer Science 11 Insight 1: Optimized updates Signature checking Make sure that control flow transfer took place from a legal predecessor Check counters (path length) Makes sure that proper number of BBs in predecessor region were visited bb1 C 1 = 1 C 1 = C bb2 bb3 bb5 bb6 C 1 = C bb4 C 1 = C C 1 = = 4?C 1 = = 5?

University of Michigan Electrical Engineering and Computer Science 12 Insight 2: Optimized checks Sufficient to have a single check for a group of basic blocks Requirement on regions The header block of a region should dominate all the BBs in that region (single entry point) Nested loops should not be contained in a region bb1bb_latch1 Interval 1Interval 2 bb2 bb4 bb3 bb_latch2

University of Michigan Electrical Engineering and Computer Science 13 Balancing Increments bb1 C 1 = 1 C 1 = C C 1 = = 3 or 4? Naively inserting checks Multiple counter value checks would be required at exits Insert extra increment along these edges C 1 = = 5? C 1 = = 4 or 5? C 1 = = 5? Developed an algorithm to get (details are in paper) increment edges increment amounts bb2 bb3 bb4 bb5 C 1 = C 1 + 1

University of Michigan Electrical Engineering and Computer Science bbN bb2 Move checks out of the loop Insert increments Such that counter value is a power of two (facilitates remainder operation instead of costly division) 14 Optimization for Loops bb1 C 1 = 1 C 1 = C C 1 = 0 C 1 / 3 == 0? C 1 == 3? bb4 C 1 = C bb1 bb2 bb3 bb4 C 1 = C C 1 = C C 1 % 4 == 0?

University of Michigan Electrical Engineering and Computer Science 15 Handling Call and Return Insts update sig var with call specific length inverse update sig var Ret_BB return; foo: call foo; Inverse update with call specific length check sig var update sig var Entry_BB call foo; Every function in the program is assigned a unique path length Global Signature variable is Updated before and inversely updated after call Inversely updated and updated inside callee

University of Michigan Electrical Engineering and Computer Science System Overview Insert signature updates and checks Collect required program information Analyze program structure Insert signature updates and checks Operating System Physical Hardware Trigger lightweight recovery based on selective symptoms (hardware exceptions) signature comparison fails Runtime Compilation 16

University of Michigan Electrical Engineering and Computer Science Evaluation Methodology Program analysis and signatures updates/checks –Implemented as compiler pass in the LLVM compiler SPECINT2K Benchmarks Statistical fault injection (SFI) experiments –GEM5 simulator in ARM syscall emulation mode Random (single) bit flip in control flow target –Simulated entire benchmarks after fault injection –Log files analyzed for results classification 17

University of Michigan Electrical Engineering and Computer Science Performance Overhead 18 The performance overhead is down from 75% to 11%

University of Michigan Electrical Engineering and Computer Science Fault Coverage 19 On average, fault coverage of ACS is comparable to CFCSS with almost 7x reduction in overhead

University of Michigan Electrical Engineering and Computer Science Fault Detection Latency 20 Fault detection latency is affected by a maximum of 5%

University of Michigan Electrical Engineering and Computer Science Conclusions 21 We propose Abstract Control Signatures (ACS) –Signature checking at coarse-grain –Simplified signature updates In comparison to a traditional signature based scheme (CFCSS) –Reduces performance overhead from 75% down to 11% –Fault coverage is comparable

University of Michigan Electrical Engineering and Computer Science 22

University of Michigan Electrical Engineering and Computer Science Fault Injection Outcome Classification Masked –No corruption in the program output CFDetects –Detected by control flow checking Covered by symptoms (HWDetects) –Produces a symptom such as page fault in 2000 cycles of fault injection Failures –Fail status on program termination or infinite loop. SDCs (Silent Data Corruptions) –Fault injections which results in user visible corruptions 23