CML CML Compiler Optimization to Reduce Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture Lab Department of Computer.

Slides:



Advertisements
Similar presentations
Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Advertisements

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
1 Optimizing compilers Managing Cache Bercovici Sivan.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Optimal Instruction Scheduling for Multi-Issue Processors using Constraint Programming Abid M. Malik and Peter van Beek David R. Cheriton School of Computer.
5th International Conference, HiPEAC 2010 MEMORY-AWARE APPLICATION MAPPING ON COARSE-GRAINED RECONFIGURABLE ARRAYS Yongjoo Kim, Jongeun Lee *, Aviral Shrivastava.
Copyright © 2002 UCI ACES Laboratory A Design Space Exploration framework for rISA Design Ashok Halambi, Aviral Shrivastava,
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.
Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.
Cost-Effective Register File Soft Error reduction Pablo Montesinos, Wei Liu and Josep Torellas, University of Illinois at Urbana-Champaign.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Register Allocation (via graph coloring)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.
Multiscalar processors
An Efficient Compiler Technique for Code Size Reduction using Reduced Bit-width ISAs S. Ashok Halambi, Aviral Shrivastava, Partha Biswas, Nikil Dutt, Alex.
Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.
1 Liveness analysis and Register Allocation Cheng-Chia Chen.
CML CML Cache Vulnerability Equations for Protecting Data in Embedded Processor Caches from Soft Errors † Aviral Shrivastava, € Jongeun Lee, † Reiley Jeyapaul.
Improving Code Generation Honors Compilers April 16 th 2002.
 2004 Deitel & Associates, Inc. All rights reserved. Chapter 9 – Real Memory Organization and Management Outline 9.1 Introduction 9.2Memory Organization.
4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)
The Processor Data Path & Control Chapter 5 Part 1 - Introduction and Single Clock Cycle Design N. Guydosh 2/29/04.
Address Obfuscation: An Efficient Approach to Combat a Broad Range of Memory Error Exploits Sandeep Bhatkar, Daniel C. DuVarney, and R. Sekar Stony Brook.
ECE355 Fall 2004Software Reliability1 ECE-355 Tutorial Jie Lian.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
A Compiler-in-the-Loop (CIL) Framework to Explore Horizontally Partitioned Cache (HPC) Architectures Aviral Shrivastava*, Ilya Issenin, Nikil Dutt *Compiler.
Confidentiality Preserving Integer Programming for Global Routing Hamid Shojaei, Azadeh Davoodi, Parmesh Ramanathan Department of Electrical and Computer.
Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj.
Copyright © 2008 UCI ACES Laboratory Kyoungwoo Lee 1, Aviral Shrivastava 2, Nikil Dutt 1, and Nalini Venkatasubramanian 1.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Register Allocation John Cavazos University.
CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.
CML CML Compiler-Managed Protection of Register Files for Energy-Efficient Soft Error Reduction Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Title of Selected Paper: IMPRES: Integrated Monitoring for Processor Reliability and Security Authors: Roshan G. Ragel and Sri Parameswaran Presented by:
Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.
CML SSDM: Smart Stack Data Management for Software Managed Multicores Jing Lu Ke Bai, and Aviral Shrivastava Compiler Microarchitecture Lab Arizona State.
Limits of Instruction-Level Parallelism Presentation by: Robert Duckles CSE 520 Paper being presented: Limits of Instruction-Level Parallelism David W.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer.
Harnessing Soft Computation for Low-Budget Fault Tolerance Daya S Khudia Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan,
2/22/2016© Hal Perkins & UW CSEP-1 CSE P 501 – Compilers Register Allocation Hal Perkins Winter 2008.
WCET-Aware Dynamic Code Management on Scratchpads for Software-Managed Multicores Yooseong Kim 1,2, David Broman 2,3, Jian Cai 1, Aviral Shrivastava 1,2.
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
Static Analysis to Mitigate Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava Compiler Microarchitecture Lab Arizona State University, USA.
CML CML A Software Solution for Dynamic Stack Management on Scratch Pad Memory Arun Kannan, Aviral Shrivastava, Amit Pabalkar, Jong-eun Lee Compiler Microarchitecture.
University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
Global Register Allocation Based on
Lecture 5 Partial Redundancy Elimination
Optimizing Compilers Background
nZDC: A compiler technique for near-Zero silent Data Corruption
Chapter 9 – Real Memory Organization and Management
Improving Program Efficiency by Packing Instructions Into Registers
Hwisoo So. , Moslem Didehban#, Yohan Ko
Instruction Scheduling Hal Perkins Winter 2008
Ann Gordon-Ross and Frank Vahid*
Memory Management Tasks
Dynamic Code Mapping Techniques for Limited Local Memory Systems
Henk Corporaal TUEindhoven 2011
Reiley Jeyapaul and Aviral Shrivastava Compiler-Microarchitecture Lab
Register Allocation Hal Perkins Summer 2004
Presentation transcript:

CML CML Compiler Optimization to Reduce Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture Lab Department of Computer Science and Engineering Arizona State University 6/1/2016 1http://

CML CML Reliability Problem What is Soft Error? –Transient error, or bit-flip –Cause energetic particle strikes voltage fluctuation signal interference How often does it occur? –Currently: ~ 1 per year –Soft error rate increasing exponentially with technology –Can be 1 per day in a decade

CML CML Reliability Problem Not all errors are visible –Logical masking –Temporal masking –Electrical masking Register File needs protection –Large memory structures Typically HW protected –Combinatorial circuit Errors can be masked –Register file Has most of architecturally visible errors for ARM926EJ [Blome ‘06] [Mitra ’05] 1 0  0 Logical masking 1  1

CML CML RF Protection – HW Approaches Full HW protection –Protect registers through ECC, parity, duplication –Very costly in terms of power, area [Blome’06] [Kandala’07] [Memik’05] [Montesinos’07] [Slegel’99] –Increased power aggravates temperature problem –Increased temperature decreases reliability Proposed - Partially Protected Register File –Runtime decision by hardware to select registers to be protected –[Lee DATE 2009] demonstrated that compiler can decide which variables to protect –Power-efficient protection, but still requires HW modification

CML CML RF Protection SW - Approaches Software schemes –Code duplication [Oh’02b] [Reis’05] –Control flow checking [Oh’02a] –Very high overhead in code size, performance Compiler Techniques –Can be very effective at very little overhead No hardware overhead, and Minimal power overhead –[Yan and Zhang 2005] Instruction Scheduling Reducing distance between loads and stores Local effect This Work: Compiler Technique –Explicitly saving and restoring long lifetime variables Add additional load stores

CML CMLOutline Soft Error Problem RF susceptible to soft errors Previous schemes to reduce soft errors in RF –HW, SW, compiler approaches RF Vulnerability 6/1/2016http:// 6

CML CML RFV: Register File Vulnerability Register File Vulnerability –Captures failure rate due to soft errors in the RF –Based on AVF (Architectural Vulnerability Factor) –Length of intervals with useful data –Unit: byte * cycle W time R WWWRR R Not vulnerable Vulnerable interval Any read-finished interval is vulnerable.

CML CML Scope of Compiler Approach 6/1/2016http:// 8 # of vulnerable intervals by their lengths (simulation, jpeg) Non-zero counts up to ~16M cycles

CML CML Scope of Compiler Approach 6/1/2016http:// 9 RFV contribution of vulnerable intervals (simulation, jpeg) More than 40% of total RFV is contributed by very few, but long live ranges More than 40% of total RFV is contributed by very few, but long live ranges Scope for a compiler

CML CML Research Problem Goal –To reduce RFV, with no hardware modification Idea –In most architectures, the memory is already protected with hardware ECC –Saving variable in the memory can reduce RFV Issues –Additional load/store can increase runtime –Increased runtime is generally bad –Increased runtime generally increases RFV 6/1/2016http:// 10

CML CMLOutline Soft Error Problem RF susceptible to soft errors Previous schemes to reduce soft errors in RF RF Vulnerability –Variable lifetime ending in a read Scope to reduce RF vulnerability –Lot of vulnerability caused by few long lifetimes Overall Research Problem –Explicitly spill and restore long lifetime variables Solutions 6/1/2016http:// 11

CML CML Starting Point A Simple Solution –Find heavily executed loop kernels –Identify unused registers in them –Protect them by saving the unused registers before the loop starts and restoring them after the loop ends Problem –Local transformation –Whether a variable is vulnerable or not is not a local decision –Inter-procedural analysis is required –Difficult to achieve efficient solution 6/1/2016http:// 12

CML CML Save and Restore unused registers 6/1/2016http:// 13 function-main() { save register s1, s2; use register s1, s2; function-foo(); s2 = function-bar(); // writing to s2 s1 = s1 + s2; restore register s1, s2; } function-foo() { loop1 { use register t1; } use register t1, t2; } function-bar() { save register s1; loop2 { use register s1, t1, t2; } restore register s1; } Loop1: uses local register t1  save s1, s2, and t2 Loop2: uses s1, t1, and t2  save s2

CML CML Need inter-procedural analysis 6/1/2016http:// 14 function-main() { save register s1, s2; use register s1, s2; function-foo(); s2 = function-bar(); // writing to s2 s1 = s1 + s2; restore register s1, s2; } function-foo() { loop1 { use register t1; } use register t1, t2; } function-bar() { save register s1; loop2 { use register s1, t1, t2; } restore register s1; }

CML CMLOutline Soft Error Problem RF susceptible to soft errors Previous schemes to reduce soft errors in RF RF Vulnerability Scope to reduce RF vulnerability Overall Research Problem –Explicitly spill and restore long lifetime variables Solutions –Simple Strategy –ILP 6/1/2016http:// 15

CML CMLProblem Problem Challenges –Inter-procedural analysis –How to accurately estimate the effect on RFV and performance ? –How to devise simple, yet effective save/restore operation ? –Huge design space 6/1/2016http:// 16 “For a given performance bound, what is the set of program points in which to insert save/restore operations, such that the transformed program will have minimum RFV ?” Should also minimize code size overhead

CML CML Problem Analogy Dynamic dual-mode system –The processor has a Boolean state for each register –State is determined at runtime, by the execution path of the program –Difficult to guarantee correctness of program transformation Static dual-mode system –A program point has a Boolean state for each register –State is determined at compile-time –Appropriate for static analysis 6/1/2016http:// 17 Problem is to partition program points or blocks into two modes ILP Formulation

CML CML Overview of Proposed Solution Definitions –Access-free block (AFB) –Access-free region (AFR) Connected subgraph of ICFG consisting of AFBs only –Maximal AFR Proposed method –Find all maximal AFRs –Evaluate all maximal AFRs for benefit/cost –Select the most profitable ones Mode change ops will be inserted –Along the boundaries of selected maximal AFRs 6/1/2016http:// 18

CML CML Mode Change Operation Issues What memory address to use? –Options: Stack-relative or Absolute Stack-relative: Use existing Stack Pointer register Absolute: Use either Global Pointer or constant register –Register used in address calculation cannot be protected using our scheme –Stack-relative addressing requires AFR be intra-procedure Where to put mode change ops? –Option 1: In basic blocks (nodes) Requires only one instruction (store/load) Can reduce the static number of mode change ops –Option 2: In edges between basic blocks Minimizes the dynamic number of mode change ops Usually requires two instructions (unconditional jump) 6/1/2016http:// 19

CML CML Evaluating AFR Benefit –RFV reduction: RFV contributed by the AFR Cost –Runtime increase: proportional to # dynamic instructions due to mode change ops –Code size increase: proportional to # static instructions due to mode change ops Two questions –What is RFV contribution by an AFR? Use static RFV model in [Lee’09b] –Where must we insert mode change ops? No need to insert mode change op if we know the next access to the register is a write 6/1/2016http:// 20

CML CML Analysis & Selection Finding all maximal AFRs –Keep adding neighbors (predecessor or successor) until reaching a non-AFB Selection problem –Given, for each maximal AFR k, v k (RFV reduction), c k (code size increase), t k (runtime increase) –Binary variables: x k (1 if selected) –Determine { x k } Objective Constraint –Knapsack problem 6/1/2016http:// 21 α: weighting parameter τ: performance tolerance

CML CML Pre- and Post-Optimization Goal: to convert edge insertion points into node insertion points Inward move: before selection (pre-optimization) Outward move: after selection (post-optimization) 6/1/2016http:// 22 S’ S S S S Inward moveOutward move

CML CML Overall Flow 6/1/2016http:// 23 Original Binary Inter-procedural CFG Analysis Set of Maximal AFRs Evaluation Selection Post-Optimization Modified Binary Runtime, RFV ILP Heuristic Cycle-Accurate Simulation For all registers Pre-Optimization Find all maximal AFRs RFV, runtime, code size

CML CMLExperiments Setting –MiBench benchmark suite –SimpleScalar simulator with MIPS instruction set –Performance tolerance: 1% or 2% Comparisons –Potential (512 cycle) If every vulnerable interval at least 512 cycles long is protected –Naïve approach Similar to Simple Solution Restricted to intra-procedural opportunity –Global-gp, Global-r0 Our method based on inter-procedural analysis GP vs. R0: Register used in mode change instruction 6/1/2016http:// 24

CML CML RFV Reduction Our techniques can reduce RFV by up to 66%, and 33~37% on average Naïve method works well only on simple benchmarks –In susan, 95% runtime is spent in one function, in one stretch 6/1/2016http:// 25 RFV Reduction compared to Original RFV

CML CML Runtime & Code Size Increase 6/1/2016http:// 26 Runtime overhead compared to Original Code size overhead compared to Original Pre- & post-optimizations can reduce code size overhead by 40%

CML CML RFV Distributions RFV contributions by long vulnerable intervals are effectively suppressed 6/1/2016http:// 27

CML CMLConclusion Motivated Compiler Approach to soft errors –Pure-compiler approach can also be effective –No modification is necessary in hardware Proposed optimization framework –Model the problem as binary partitioning problem –Propose efficient heuristic based on access-free region –Propose optimizations to reduce code size overhead Our techniques can be very effective –Can reduce RFV by up to 66%, and 33~37% on average –Can explicitly control runtime overhead –Naïve method without inter-procedural analysis can be very ineffective 6/1/2016http:// 28