Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing.

Slides:



Advertisements
Similar presentations
Tunable Sensors for Process-Aware Voltage Scaling
Advertisements

Morgan Kaufmann Publishers The Processor
University of Michigan Electrical Engineering and Computer Science 1 A Distributed Control Path Architecture for VLIW Processors Hongtao Zhong, Kevin Fan,
Microprocessor Reliability
Mapping for Better Than Worst-Case Delays In LUT-Based FPGA Designs Kirill Minkovich and Jason Cong VLSI CAD Lab Computer Science Department University.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
Class presentation based on ISSCC : A Low-power 1GHz Razor FIR Accelerator with Time-Borrow Tracking Pipeline and Approximate Error Correction.
Power Reduction Techniques For Microprocessor Systems
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Synchronous Digital Design Methodology and Guidelines
Clock Design Adopted from David Harris of Harvey Mudd College.
A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
Low Voltage Sequential Circuit With a Ring Oscillator Clock ELEC 6270 Low power design of Electronic Circuits Spring, 2009 Presented by Mridula Allani.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
CS 7810 Lecture 12 Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors D. Brooks et al. IEEE Micro, Nov/Dec.
Subthreshold Logic Energy Minimization with Application- Driven Performance EE241 Final Project Will Biederman Dan Yeager.
Super-Drowsy Caches Single-V DD and Single-V T Super-Drowsy Techniques for Low- Leakage High-Performance Instruction Caches Nam Sung Kim, Krisztián Flautner,
University of Michigan Advanced Computer Architecture Lab. March 21, Key PED Challenges David Blaauw University of Michigan
Opportunities and Challenges for Better Than Worst­Case Design Todd Austin (presenter) Valeria Bertacco David Blaauw Trevor Mudge University of Michigan.
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
1 Razor: A Low Power Processor Design Presented By: - Murali Dharan.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
1 Drowsy Caches Simple Techniques for Reducing Leakage Power Krisztián Flautner Nam Sung Kim Steve Martin David Blaauw Trevor Mudge
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Circuit-Level Timing Speculation: The Razor Latch Developed by Trevor Mudge’s group at the University of Michigan, 2003.
1 paper I design and implementation of the aegis single-chip secure processor using physical random functions, isca’05 nuno alves 28/sep/06.
University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded.
Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations ‡ Computer Science and Engineering, UC San Diego variability.org.
Todd Austin University of Michigan X-Stack Energy Optimization: Fact or Fiction.
1 Practical Selective Replay for Reduced-Tag Schedulers Dan Ernst and Todd Austin Advanced Computer Architecture Lab The University of Michigan June 8.
A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.
Ronny Krashinsky Seongmoo Heo Michael Zhang Krste Asanovic MIT Laboratory for Computer Science SyCHOSys Synchronous.
© 2003 Xilinx, Inc. All Rights Reserved FPGA Design Techniques.
Low-Power Wireless Sensor Networks
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Hardware Multithreading. Increasing CPU Performance By increasing clock frequency By increasing Instructions per Clock Minimizing memory access impact.
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Safe Overclocking Safe Overclocking of Tightly Coupled CGRAs and Processor Arrays using Razor © 2012 Guy Lemieux Alex Brant, Ameer Abdelhadi, Douglas Sim,
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
XIAOYU HU AANCHAL GUPTA Multi Threshold Technique for High Speed and Low Power Consumption CMOS Circuits.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Basics of register-transfer design: –data paths and controllers; –ASM charts. Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.
How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining.
Hrushikesh Chavan Younggyun Cho Structural Fault Tolerance for SOC.
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Cyclone: A Low-Complexity Broadcast-Free Dynamic Instruction.
Patricia Gonzalez Divya Akella VLSI Class Project.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
Sunpyo Hong, Hyesoon Kim
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
Equalizer: Dynamically Tuning GPU Resources for Efficient Execution Ankit Sethia* Scott Mahlke University of Michigan.
University of Michigan Electrical Engineering and Computer Science Dynamic Voltage/Frequency Scaling in Loop Accelerators using BLADES Ganesh Dasika 1,
Yuxi Liu The Chinese University of Hong Kong Circuit Timing Problem Driven Optimization.
University of Michigan Advanced Computer Architecture Lab. 2 CAD Tools for Variation Tolerance David Blaauw and Kaviraj Chopra University of Michigan.
SIMD Lane Decoupling Improved Timing-Error Resilience
The University of British Columbia
Timing Analysis 11/21/2018.
Hardware Multithreading
Advanced Computer Architecture
Control unit extension for data hazards
Control unit extension for data hazards
Control unit extension for data hazards
Presentation transcript:

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing Speculation Advanced Computer Architecture Laboratory The University of Michigan Dan Ernst, Nam Sung Kim, Shidhartha Das, Sanjay Pant, Rajeev Rao, Toan Pham, and Conrad Ziesler Faculty Members : David Blaauw, Todd Austin, and Trevor Mudge Krisztián Flautner, ARM Ltd. December 3 rd, 2003

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Dynamic Voltage Scaling and Design Uncertainty DVS - Adapting voltage/frequency to meet performance demands of workload –Lower processor voltage during periods of low utilization –Lower Voltage is a Good Thing™ for power Minimum voltage is limited by Safety Margins –Error-free operation must be guaranteed! Technology trends are Maximizing the Minimums –Process and temperature variation –Capacitive and inductive noise Key Observation: worst-case conditions also highly improbable –Significant gain for circuits optimized for common case –Efficient mechanisms needed to tolerate infrequent worst-case scenarios Intra-die variations in ILD thickness

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Shaving Voltage Margins with Razor Goal: reduce voltage margins with in-situ error detection and correction for delay failures Proposed Approach: –Remove safety margins and tolerate occasional errors –Tune processor voltage based on error rate –Purposely run below critical voltage Data-dependent latency margins Trade-off: voltage power savings vs. overhead of correction –Analogous to wireless power modulation Traditional DVS Zero margin Sub-critical

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Razor Timing Error Detection Second sample of logic value used to validate earlier sample Key design issues: –Maintaining pipeline forward progress - Meta-stable results in main flip-flop –Short path impact on shadow-latch - Recovering pipeline state after errors –Power overhead of error detection and correction Main FF Shadow Latch Main FF clk clk_del MEM 39 9

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/ Razor Short Path Constraint Second sample of logic value used to validate earlier sample Key design issues: –Maintaining pipeline forward progress - Meta-stable results in main flip-flop –Short path impact on shadow-latch - Recovering pipeline state after errors –Power overhead of error detection and correction Main FF Shadow Latch Main FF clk clk_del 5 4 Hold Constraint (~1/2 cycle) MEM 8 3 2

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 inst2 IF Razor FF ID Razor FF EX Razor FF MEM WB (reg/mem) error recover Razor FF PC recover error clock Cycle:0 inst1inst3inst4inst inst6 Centralized Razor Pipeline Error Recovery Once cycle penalty for timing failure Global synchronization may be difficult for fast, complex designs

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 recover IF Razor FF ID Razor FF EX Razor FF MEM (read-only) WB (reg/mem) errorbubble recover Razor FFStabilizer FFPC recover flushID bubble errorbubble flushID errorbubble flushID Flush Control flushID error Cycle:0 inst1inst2inst3inst4inst inst6 Distributed Razor Pipeline Error Recovery inst2inst7inst8 789 inst3inst4 Multiple cycle penalty for timing failure Scalable design since all recovery communication is local Builds on existing branch / data speculation recovery framework

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Error-Rate Studies – Hardware Measurement

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Error Rate Studies – Empirical Results 35% energy savings with 1.3% error 22% saving once every 20 seconds!

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Error Rate Studies – SPICE-Level Simulations Based on a SPICE-level simulations of a Kogge-Stone adder 200 mV

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Razor I - Prototype Razor Implementation 4 stage 64-bit Alpha pipeline: –200MHz expected operation in 0.18  m technology, 1.8V, ~500mW –Tunable via software from MHz, V –Razor applied to combinational logic Razor overhead: –Total of 192 Razor flip-flops out of 2408 total (9%) –Error-free power overhead: ~ 3% D-Cache IF ID EX MEM WB Register File I-Cache 3.3 mm 3 mm

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Effects of Razor DVS Decreasing Supply Voltage Energy Energy of Processor Operations, E proc Energy of Pipeline Recovery, E recovery Total Energy, E total = E proc + E recovery Optimal E total Pipeline Throughput IPC Energy of Processor w/o Razor Support

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 EX-Stage Analysis – Optimal Voltage Sweep Recovery cost includes energy to recover entire pipeline (18x an add)

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 EX-Stage Analysis – Optimal Voltage Sweep

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Analysis – Energy-Optimal Voltage

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Analysis – Razor DVS Execution

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Analysis – Razor DVS Performance

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Conclusions In-situ detection/correction of timing errors –Eliminate process, temperature, and safety margins –Tune processor voltage based on error rate –Purposely run below critical voltage to capture data-dependent latency margins Implemented with architecture/circuit support –Double-sampling metastability-tolerant Razor flip-flops validate logic results –Pipeline initiates recovery after circuit timing errors, no voltage/clock re-tuning needed Trade-off: supply voltage power savings vs. overhead of correction –Running with error is good! Error_L Error comparator RAZOR FF clk_del Main Flip-Flop clk Shadow Latch Q1 D1 0 1 rec ove r IF Razor FF ID Razor FF EX Razor FF MEM (read-only) WB (reg/mem) error bu bb le recover Razor FFStabilizer FFPC recover flushI D bu bbl e error bu bbl e flushI D error bu bb le flushI D Flush Control flushI D error

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Future Directions Research opportunities –Razor for caches/memory and control logic –Voltage control algorithms, especially per-stage tuning –Typical-case energy optimized designs (instead of worse-case latency optimized) –Turnkey application of Razor technology Prototype design, fabrication, evaluation –Razor I – Q – Razor-ized combinational logic, global tuning –Razor II – Q – Razor-ized caches and control logic, per-stage tuning Other applications –Single-event upset (SEU) protection using Razor error detection/re-execution –Over-clocking for performance improvement (large gains among hobbyists)

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Questions ? ? ? ? ? ? ? ? ? ? ? ?

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Back-up Slides

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Traditional DVS –Valid voltage / delay combinations “blessed” at design time –Approach leaves a significant amount of energy “on the table” –Temperature, process, data, and safety margins placed on voltage Other approaches miss some margins –Slack detector – automatic tuning ARM’s Intelligent Energy Manager (IEM) Processor voltage automatically tuned to external ambient conditions Inverter chain designed to track most restrictive critical path, margin still required Other Approaches to Dynamic Voltage Scaling L2 Cache control Floating point and graphics Data cache Cache control L2 tags Ex Unit Control Unit IOUNITIOUNIT M e m C o nt ro l

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Compare latched data with shadow-latch on delayed clock Upon failure: place data from shadow-latch in main latch –Ensure shadow latch always correct using conservative design techniques –Correct value in shadow latch guarantees forward progress Recover pipeline using microarchitectural recovery mechanism Razor Flip-Flop Implementation Error comparator RAZOR FF Main Flip-Flop clk clk_del Shadow Latch Q Logic Stage L1 Logic Stage L2 Error_L 0 1 D

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Razor Flip-Flop Circuit Inv_n Inv_p Meta-stability detector Error_L

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Overcoming Short Path Constraints Delayed clock imposes a short-path constraint Pad with extra delay Razor_ff ff clock Long Paths Short Paths –Razor necessary only for latches on slow paths –Pad fast path for latches with mixed path delays –Trade-off between DVS headroom and short path constraints clock clock_del t delay t hold Min. path delay Min. Path Delay > t delay + t hold intended pathshort path

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Hardware Measurement Setup 48-bit LFSR XXX clk/2 clk clk/2 clk != 40-bit Error Counter Slow Pipeline A Slow Pipeline B Fast Pipeline clk/ x18 stabilize

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Methodology Challenge: instruction latency depends on circuit evaluation latency –May vary with changes in stage inputs, stage logic, voltage, temperature… Dynamic timing simulation combines architectural/circuit simulation Initial implementation utilized a hand-generated EX-stage circuit model –Effort ongoing to automate extraction/decomposition/integration into SimpleScalar

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Supply Voltage Control System E ref Voltage Control Function  Pipeline reset V dd E diff = E ref - E sample - E sample Voltage Regulator E diff error signals Current design utilizes a very simple proportional control function –Control algorithm implemented in software

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Redo instruction in MEM IFIDEXMEM inst WB inst clk clk_d error ID.d EX.d MEM.d Error No Error Pipeline Recovery

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Voltage Scaling under Dynamic Workloads Adapt frequency/voltage to performance demands of workload –Software controlled processor speed –Lower processor voltage during periods of low operating frequency Quadratic reduction in dynamic power and energy Super-quadratic reduction in leakage Utilization Time Voltage Freq V dd

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Flow Automatic creation of very detailed power/delay C-models IF FF IDEXMEM WB PC FF Circuit Extraction with Parasitics Variable Voltage SDF generation Power/Delay C-model Architecture Specification Detailed Power/Delay Analysis SimpleScalar + DTA Voltage Control Algorithm High-level HDL Specification

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Methodology Dynamic timing simulation combines architectural/circuit simulation –Contrast to static timing simulation which is only concerned with critical path –SimpleScalar/Alpha architectural-level simulation –Gate-level simulation of per-stage logic blocks Logic block model describes cells, local and global interconnect Cells characterized with SPICE at varied slew/cap-load/voltage Each cycle, circuit simulator evaluates delay of each stages’ logic block\

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Analysis – Razor DVS Execution

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Razor Demo

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 More Details on Meta-Stability Sub-critical operation invites meta-stability –Meta-stability detector itself can become meta-stable –double latch error signal to obtain sufficient small probability clk_b clk clk_b DQ clk_del clk_del_b restore bubble flush Dynamic Or / Latch –Flush entire pipe –No forward progress –Reduce frequency restore bubble flush pos neg fail pos neg error

Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 IFIDEXMEMWB inst1 clk clk_d error ID.d EX.d MEM.d Short Path I2 inst2 I1 I2 I1 Short Path Failure