Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing Speculation Advanced Computer Architecture Laboratory The University of Michigan Dan Ernst, Nam Sung Kim, Shidhartha Das, Sanjay Pant, Rajeev Rao, Toan Pham, and Conrad Ziesler Faculty Members : David Blaauw, Todd Austin, and Trevor Mudge Krisztián Flautner, ARM Ltd. December 3 rd, 2003
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Dynamic Voltage Scaling and Design Uncertainty DVS - Adapting voltage/frequency to meet performance demands of workload –Lower processor voltage during periods of low utilization –Lower Voltage is a Good Thing™ for power Minimum voltage is limited by Safety Margins –Error-free operation must be guaranteed! Technology trends are Maximizing the Minimums –Process and temperature variation –Capacitive and inductive noise Key Observation: worst-case conditions also highly improbable –Significant gain for circuits optimized for common case –Efficient mechanisms needed to tolerate infrequent worst-case scenarios Intra-die variations in ILD thickness
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Shaving Voltage Margins with Razor Goal: reduce voltage margins with in-situ error detection and correction for delay failures Proposed Approach: –Remove safety margins and tolerate occasional errors –Tune processor voltage based on error rate –Purposely run below critical voltage Data-dependent latency margins Trade-off: voltage power savings vs. overhead of correction –Analogous to wireless power modulation Traditional DVS Zero margin Sub-critical
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Razor Timing Error Detection Second sample of logic value used to validate earlier sample Key design issues: –Maintaining pipeline forward progress - Meta-stable results in main flip-flop –Short path impact on shadow-latch - Recovering pipeline state after errors –Power overhead of error detection and correction Main FF Shadow Latch Main FF clk clk_del MEM 39 9
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/ Razor Short Path Constraint Second sample of logic value used to validate earlier sample Key design issues: –Maintaining pipeline forward progress - Meta-stable results in main flip-flop –Short path impact on shadow-latch - Recovering pipeline state after errors –Power overhead of error detection and correction Main FF Shadow Latch Main FF clk clk_del 5 4 Hold Constraint (~1/2 cycle) MEM 8 3 2
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 inst2 IF Razor FF ID Razor FF EX Razor FF MEM WB (reg/mem) error recover Razor FF PC recover error clock Cycle:0 inst1inst3inst4inst inst6 Centralized Razor Pipeline Error Recovery Once cycle penalty for timing failure Global synchronization may be difficult for fast, complex designs
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 recover IF Razor FF ID Razor FF EX Razor FF MEM (read-only) WB (reg/mem) errorbubble recover Razor FFStabilizer FFPC recover flushID bubble errorbubble flushID errorbubble flushID Flush Control flushID error Cycle:0 inst1inst2inst3inst4inst inst6 Distributed Razor Pipeline Error Recovery inst2inst7inst8 789 inst3inst4 Multiple cycle penalty for timing failure Scalable design since all recovery communication is local Builds on existing branch / data speculation recovery framework
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Error-Rate Studies – Hardware Measurement
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Error Rate Studies – Empirical Results 35% energy savings with 1.3% error 22% saving once every 20 seconds!
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Error Rate Studies – SPICE-Level Simulations Based on a SPICE-level simulations of a Kogge-Stone adder 200 mV
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Razor I - Prototype Razor Implementation 4 stage 64-bit Alpha pipeline: –200MHz expected operation in 0.18 m technology, 1.8V, ~500mW –Tunable via software from MHz, V –Razor applied to combinational logic Razor overhead: –Total of 192 Razor flip-flops out of 2408 total (9%) –Error-free power overhead: ~ 3% D-Cache IF ID EX MEM WB Register File I-Cache 3.3 mm 3 mm
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Effects of Razor DVS Decreasing Supply Voltage Energy Energy of Processor Operations, E proc Energy of Pipeline Recovery, E recovery Total Energy, E total = E proc + E recovery Optimal E total Pipeline Throughput IPC Energy of Processor w/o Razor Support
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 EX-Stage Analysis – Optimal Voltage Sweep Recovery cost includes energy to recover entire pipeline (18x an add)
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 EX-Stage Analysis – Optimal Voltage Sweep
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Analysis – Energy-Optimal Voltage
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Analysis – Razor DVS Execution
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Analysis – Razor DVS Performance
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Conclusions In-situ detection/correction of timing errors –Eliminate process, temperature, and safety margins –Tune processor voltage based on error rate –Purposely run below critical voltage to capture data-dependent latency margins Implemented with architecture/circuit support –Double-sampling metastability-tolerant Razor flip-flops validate logic results –Pipeline initiates recovery after circuit timing errors, no voltage/clock re-tuning needed Trade-off: supply voltage power savings vs. overhead of correction –Running with error is good! Error_L Error comparator RAZOR FF clk_del Main Flip-Flop clk Shadow Latch Q1 D1 0 1 rec ove r IF Razor FF ID Razor FF EX Razor FF MEM (read-only) WB (reg/mem) error bu bb le recover Razor FFStabilizer FFPC recover flushI D bu bbl e error bu bbl e flushI D error bu bb le flushI D Flush Control flushI D error
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Future Directions Research opportunities –Razor for caches/memory and control logic –Voltage control algorithms, especially per-stage tuning –Typical-case energy optimized designs (instead of worse-case latency optimized) –Turnkey application of Razor technology Prototype design, fabrication, evaluation –Razor I – Q – Razor-ized combinational logic, global tuning –Razor II – Q – Razor-ized caches and control logic, per-stage tuning Other applications –Single-event upset (SEU) protection using Razor error detection/re-execution –Over-clocking for performance improvement (large gains among hobbyists)
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Questions ? ? ? ? ? ? ? ? ? ? ? ?
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Back-up Slides
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Traditional DVS –Valid voltage / delay combinations “blessed” at design time –Approach leaves a significant amount of energy “on the table” –Temperature, process, data, and safety margins placed on voltage Other approaches miss some margins –Slack detector – automatic tuning ARM’s Intelligent Energy Manager (IEM) Processor voltage automatically tuned to external ambient conditions Inverter chain designed to track most restrictive critical path, margin still required Other Approaches to Dynamic Voltage Scaling L2 Cache control Floating point and graphics Data cache Cache control L2 tags Ex Unit Control Unit IOUNITIOUNIT M e m C o nt ro l
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Compare latched data with shadow-latch on delayed clock Upon failure: place data from shadow-latch in main latch –Ensure shadow latch always correct using conservative design techniques –Correct value in shadow latch guarantees forward progress Recover pipeline using microarchitectural recovery mechanism Razor Flip-Flop Implementation Error comparator RAZOR FF Main Flip-Flop clk clk_del Shadow Latch Q Logic Stage L1 Logic Stage L2 Error_L 0 1 D
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Razor Flip-Flop Circuit Inv_n Inv_p Meta-stability detector Error_L
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Overcoming Short Path Constraints Delayed clock imposes a short-path constraint Pad with extra delay Razor_ff ff clock Long Paths Short Paths –Razor necessary only for latches on slow paths –Pad fast path for latches with mixed path delays –Trade-off between DVS headroom and short path constraints clock clock_del t delay t hold Min. path delay Min. Path Delay > t delay + t hold intended pathshort path
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Hardware Measurement Setup 48-bit LFSR XXX clk/2 clk clk/2 clk != 40-bit Error Counter Slow Pipeline A Slow Pipeline B Fast Pipeline clk/ x18 stabilize
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Methodology Challenge: instruction latency depends on circuit evaluation latency –May vary with changes in stage inputs, stage logic, voltage, temperature… Dynamic timing simulation combines architectural/circuit simulation Initial implementation utilized a hand-generated EX-stage circuit model –Effort ongoing to automate extraction/decomposition/integration into SimpleScalar
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Supply Voltage Control System E ref Voltage Control Function Pipeline reset V dd E diff = E ref - E sample - E sample Voltage Regulator E diff error signals Current design utilizes a very simple proportional control function –Control algorithm implemented in software
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Redo instruction in MEM IFIDEXMEM inst WB inst clk clk_d error ID.d EX.d MEM.d Error No Error Pipeline Recovery
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Voltage Scaling under Dynamic Workloads Adapt frequency/voltage to performance demands of workload –Software controlled processor speed –Lower processor voltage during periods of low operating frequency Quadratic reduction in dynamic power and energy Super-quadratic reduction in leakage Utilization Time Voltage Freq V dd
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Flow Automatic creation of very detailed power/delay C-models IF FF IDEXMEM WB PC FF Circuit Extraction with Parasitics Variable Voltage SDF generation Power/Delay C-model Architecture Specification Detailed Power/Delay Analysis SimpleScalar + DTA Voltage Control Algorithm High-level HDL Specification
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Methodology Dynamic timing simulation combines architectural/circuit simulation –Contrast to static timing simulation which is only concerned with critical path –SimpleScalar/Alpha architectural-level simulation –Gate-level simulation of per-stage logic blocks Logic block model describes cells, local and global interconnect Cells characterized with SPICE at varied slew/cap-load/voltage Each cycle, circuit simulator evaluates delay of each stages’ logic block\
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Simulation Analysis – Razor DVS Execution
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 Razor Demo
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 More Details on Meta-Stability Sub-critical operation invites meta-stability –Meta-stability detector itself can become meta-stable –double latch error signal to obtain sufficient small probability clk_b clk clk_b DQ clk_del clk_del_b restore bubble flush Dynamic Or / Latch –Flush entire pipe –No forward progress –Reduce frequency restore bubble flush pos neg fail pos neg error
Advanced Computer Architecture Lab The University of Michigan Razor DVS Dan Ernst – 12/3/2003 IFIDEXMEMWB inst1 clk clk_d error ID.d EX.d MEM.d Short Path I2 inst2 I1 I2 I1 Short Path Failure