CML CML Compiler-Managed Protection of Register Files for Energy-Efficient Soft Error Reduction Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture.

Slides:

Advertisements

Similar presentations

Quantitative Analysis of Control Flow Checking Mechanisms for Soft Errors Aviral Shrivastava, Abhishek Rhisheekesan, Reiley Jeyapaul, and Carole-Jean Wu.

Advertisements

CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.

Procedures in more detail. CMPE12cGabriel Hugh Elkaim 2 Why use procedures? –Code reuse –More readable code –Less code Microprocessors (and assembly languages)

Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.

Segmentation and Paging Considerations

IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of.

CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.

Copyright © 2002 UCI ACES Laboratory A Design Space Exploration framework for rISA Design Ashok Halambi, Aviral Shrivastava,

Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.

Procedures in more detail. CMPE12cCyrus Bazeghi 2 Procedures Why use procedures? Reuse of code More readable Less code Microprocessors (and assembly languages)

Cache effective mergesort and quicksort Nir Zepkowitz Based on: “Improving Memory Performance of Sorting Algorithms” by Li Xiao, Xiaodong Zhang, Stefan.

Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.

CS 104 Introduction to Computer Science and Graphics Problems

Memory Management 2010.

Cost-Effective Register File Soft Error reduction Pablo Montesinos, Wei Liu and Josep Torellas, University of Illinois at Urbana-Champaign.

An Efficient Compiler Technique for Code Size Reduction using Reduced Bit-width ISAs S. Ashok Halambi, Aviral Shrivastava, Partha Biswas, Nikil Dutt, Alex.

State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.

Benefits of Early Cache Miss Determination Memik G., Reinman G., Mangione-Smith, W.H. Proceedings of High Performance Computer Architecture Pages: 307.

Compilation Techniques for Energy Reduction in Horizontally Partitioned Cache Architectures Aviral Shrivastava, Ilya Issenin, Nikil Dutt Center For Embedded.

Cost-Efficient Soft Error Protection for Embedded Microprocessors

CML CML Cache Vulnerability Equations for Protecting Data in Embedded Processor Caches from Soft Errors † Aviral Shrivastava, € Jongeun Lee, † Reiley Jeyapaul.

HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.

Design of SCS Architecture, Control and Fault Handling.

Spring 2008 CSE 591 Compilers for Embedded Systems Aviral Shrivastava Department of Computer Science and Engineering Arizona State University.

University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded.

Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

A Compiler-in-the-Loop (CIL) Framework to Explore Horizontally Partitioned Cache (HPC) Architectures Aviral Shrivastava*, Ilya Issenin, Nikil Dutt *Compiler.

Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj.

Copyright © 2008 UCI ACES Laboratory Kyoungwoo Lee 1, Aviral Shrivastava 2, Nikil Dutt 1, and Nalini Venkatasubramanian 1.

© 2004, D. J. Foreman 1 Memory Management. © 2004, D. J. Foreman 2 Building a Module -1  Compiler ■ generates references for function addresses may be.

Chapter 4 Storage Management (Memory Management).

CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.

By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.

1/36 by Martin Labrecque How to Fake 1000 Registers Oehmke, Binkert, Mudge, Reinhart to appear in Micro 2005.

CML CML Compiler Optimization to Reduce Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture Lab Department of Computer.

CE Operating Systems Lecture 14 Memory management.

A Single-Pass Cache Simulation Methodology for Two-level Unified Caches + Also affiliated with NSF Center for High-Performance Reconfigurable Computing.

Bypass Aware Instruction Scheduling for Register File Power Reduction Sanghyun Park, Aviral Shrivastava Nikil Dutt, Alex Nicolau Yunheung Paek Eugene Earlie.

By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.

CML Smart Cache Cleaning: Energy Efficient Vulnerability Reduction in Embedded Processors Reiley Jeyapaul, and Aviral Shrivastava Compiler Microarchitecture.

Analysis of Cache Tuner Architectural Layouts for Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.

Using Loop Invariants to Detect Transient Faults in the Data Caches Seung Woo Son, Sri Hari Krishna Narayanan and Mahmut T. Kandemir Microsystems Design.

Harnessing Soft Computation for Low-Budget Fault Tolerance Daya S Khudia Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan,

Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.

EnerJ: Approximate Data Types for Safe and General Low-Power Computation (PLDI’2011) Adrian Sampson, Werner Dietl, Emily Fortuna Danushen Gnanapragasam,

High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.

NETW3005 Virtual Memory. Reading For this lecture, you should have read Chapter 9 (Sections 1-7). NETW3005 (Operating Systems) Lecture 08 - Virtual Memory2.

Static Analysis to Mitigate Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava Compiler Microarchitecture Lab Arizona State University, USA.

CML CML A Software Solution for Dynamic Stack Management on Scratch Pad Memory Arun Kannan, Aviral Shrivastava, Amit Pabalkar, Jong-eun Lee Compiler Microarchitecture.

Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.

Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.

LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.

University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.

CSE 340 Computer Architecture Summer 2016 Understanding Performance.

Application-Specific Customization of Soft Processor Microarchitecture

nZDC: A compiler technique for near-Zero silent Data Corruption

Improving Program Efficiency by Packing Instructions Into Registers

UnSync: A Soft Error Resilient Redundant Multicore Architecture

Hwisoo So. , Moslem Didehban#, Yohan Ko

Performance Optimization for Embedded Software

Ann Gordon-Ross and Frank Vahid*

Dynamic Code Mapping Techniques for Limited Local Memory Systems

Reiley Jeyapaul and Aviral Shrivastava Compiler-Microarchitecture Lab

Fault Tolerant Systems in a Space Environment

Application-Specific Customization of Soft Processor Microarchitecture

COMP755 Advanced Operating Systems

Introduction to Computer Systems Engineering

CSE 542: Operating Systems

Presentation transcript:

CML CML Compiler-Managed Protection of Register Files for Energy-Efficient Soft Error Reduction Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture Lab Department of Computer Science and Engineering Arizona State University 10/19/2015 1http://

CML CML Reliability Problem Soft Errors –Transient errors caused by voltage and signal fluctuations and interference –Radiation strike causes majority of soft errors Soft Error Rate –Current soft errors are about 1 per year –Soft error rate increasing exponentially with technology –Will be 1 per day in a decade 10/19/2015http:// 2 [ ??? ]

CML CML Soft Errors in Processor Core Masking Effect –Logical masking –Temporal masking –Electrical masking Visible Errors –Faults occurring to combinational circuits are far less visible –For ARM926EJ, most architecturally visible errors within a processor core actually occur in register files [Blome ’06] 10/19/2015http:// 3 [Mitra ’05] 1 0  0 Logical masking

CML CML Mitigating Soft Errors in RF Microarchitectural Techniques –Shield [Montesinos ’07]: ECC table for a fraction of registers chosen dynamically –Replication in unused physical registers [Memik ’05]: for superscalar processors –Register value cache [Blome ’06]: replicating recent values in a tiny cache –In-register replication [Kandala ’07]: for register values fitting in 16 bits or less 10/19/2015http:// 4 Partial protection reduces the area overhead, but not necessarily the power overhead!

CML CML Hardware Partial Protection 10/19/2015http:// 5 [Montesinos ’07] Write: To protect or not? Write: Where to put it? Write: Generate ECC

CML CML Hardware Partial Protection 10/19/2015http:// 6 [Montesinos ’07] Read: Check ECC Read: Is this value protected or not? Read: Where to get it?

CML CML Compiler Approach Action Hardware ApproachCompiler Approach ProtectedUnprotectedProtectedUnprotected Write Decide to protect √√XX Which entry to use √XXX Generate ECC √X√X Read Decide if it was protected √√XX Which entry to find √XXX Check ECC √X√X 10/19/2015http:// 7 Hardware Approach –Non-zero overhead even for unprotected values! Compiler Approach –Removes power overhead in Decision / Selection –Could make better decisions by using program information

CML CML Compiler Approach Issues Compiler Approach –Protection decision is made at compile-time and embedded in instructions Issues –How to embed protection decision in instructions? ISA incompatibility has a great disadvantage –How to make optimal protection decision? Global optimum is likely to be NP-complete; local optimum may not be good –What is the right metric to use for optimization? Soft error rate or energy, or a combination of the two? –Runtime should not be increased How to ensure little or no runtime increase? 10/19/2015http:// 8

CML CML Our Compiler Approach Architecture: Register Number Based Protection –Protection for only K highest-numbered registers –No ISA modification –No decision/selection logic Compiler Optimization Method: Register Swapping –After usual compilation, swap register allocation –So that important variables are in protected registers –No runtime increase –Two versions: ARS, FRS (can be combined) 10/19/2015http:// 9 Partially Protected RF R0 R24 R25 R31 … … Unprotected Protected # assembly code R R9.. R R R R25.. To protect R3 # assembly code R R25.. R R R R9..

CML CML Optimization Metric Vulnerability –Combined length of live ranges (from write to last read) –Directly proportional to soft error rate Energy Overhead –Approximately proportional to access count to protected registers Energy Efficiency Metric –Weighted sum of vulnerability and energy overhead –Minimizing for both ensures high energy-efficiency 10/19/2015http:// 10 W time R WWWRRR RR High V Seldom accessed Low V Frequently accessed Examples Good Bad

CML CML Register Swapping ARS (Application-level Register Swapping) –All registers can be swapped –Except for architecturally distinguished registers: eg. R31 in MIPS (implicitly accessed by JAL instruction) –Globally one register swap rule FRS (Function-level Register Swapping) –Register swap rule for each function –Must respect calling convention: eg. a caller-saved register can be swapped with another caller-saved register –FRS/t: swapping between caller-saved registers (t-registers) Live range is limited to one function –FRS/s: swapping between callee-saved registers (s-registers) Live range may extend over multiple functions 10/19/2015http:// 11

CML CML T-register vs. S-register 10/19/2015http:// 12 f1f1 f2f2 f3f3 f4f4 f5f5 f5f5 time Call depth T-register live ranges S-register live ranges var1 var2 var3 var4 var5 var1 var2 var3 var4 Live range of t-register variable do not cross any function transition. Live range of s-register variable is limited to one function instance but may cross function transitions.

CML CML Optimal ARS, FRS/t ARS –ARS is a special case of FRS/t with only one function FRS/t –Each function can be independently optimized –Input: V and E of each register (before swapping) for each function –Sort registers in increasing order of (V – β E), and protect the K highest numbered ones –Very efficient: O(R ∙ N) R: number of registers N: number of functions 10/19/2015http:// 13

CML CML Challenges in Optimizing FRS/s Can we find the vulnerability of s-register in a function? –Vulnerability in F2 (callee) depends on F1 (caller) –Vulnerability in F3 (caller) depends on F4 (callee) –Potentially every caller-callee pair has inter-dependence Finding optimal FRS for s-registers –Finding global optimum is intractable -> simple heuristic 10/19/2015http:// 14 F1 F2 F1 callreturn t1t2t3t4 R/W?WRW F3 F4 F3 callreturn t5t7 WW t6 R Vulnerable if t4 is R Vulnerable if reg is accessed in F4 Vulnerable if t7 is R

CML CMLHeuristic 10/19/2015http:// 15 Observation –Next access after current basic block is almost always a read (~90%) –Our heuristic assumes s-registers are always “read” afterwards –Thus we can optimize each function separately Chances of s-registers being first read after a basic block

CML CMLExperiments Comparisons –Compiler approach vs. Hardware approach –Optimizing for energy-efficiency vs. Optimizing for vulnerability only Setting –SimpleScalar simulator (MIPS instruction set), in-order execution T-registers: R1, R8~R15, R24, R25 S-registers: R16~R23, R30 –Application benchmarks from MiBench –Design parameter (β ): RF vulnerability-to-energy ratio of the entire program 10/19/2015http:// 16

CML CML V-K Plot 10/19/2015http:// 17 V (x10 6 ) K (s-registers) Optimizing for vulnerability only Optimizing for energy-efficiency

CML CML V-E Tradeoff 10/19/2015http:// % E (x10 6 ) K=6 K=5 K=6 V (x10 6 ) Optimizing for energy-efficiency may cut energy overhead to 50% compared to optimizing for vulnerability only.

CML CML Energy Efficiency of Our Technique 10/19/2015http:// 19 Weighted Sum of Vulnerability and Energy (Normalized to Vulnerability Only) 24% on average

CML CML HW vs. Compiler Approach Ideal HW Case –Consider the ideal HW case rather than a particular HW algorithm/implementation –Assume only the most profitable registers are protected (we use offline algorithm to find this out) –Could be better in making what-to-protect decisions, but with significant energy cost Power Model –What is important is the relative power dissipation between Decision making Selecting an entry Creating/checking signature (eg. ECC, parity, duplicate) Compiler Approach –Apply FRS followed by ARS 10/19/2015http:// 20

CML CML V-E Tradeoff Comparison 10/19/2015http:// 21 E (x10 6 ) V (x10 6 ) - Compiler approach is much more energy efficient than ideal hardware case - Proposed technique is more energy efficient than simple vulnerability optimization Energy overhead even for unprotected variables (vulner. only) (energy effic.)

CML CMLConclusion Motivated Compiler Approach to soft errors –Requires hardware protection mechanism (partially protected RF) –Optimal use of hardware feature by compiler Proposed ARS, FRS –ARS is easier to apply, optimize –FRS is challenging to optimize but gives more energy reduction –Can be combined for highest energy efficiency Encouraging Results –Much more energy efficient than hardware approaches –Can reduce energy overhead by 24% compared to simple vulnerability optimization 10/19/2015http:// 22