University of Michigan Electrical Engineering and Computer Science 1 Online Timing Analysis for Wearout Detection Jason Blome, Shuguang Feng, Shantanu Gupta, Scott Mahlke University of Michigan
Electrical Engineering and Computer Science 2 Wearout Mechanisms There are a lot of them: ► Electromigration (EM) ► Time-dependent dielectric breakdown (TDDB) ► Negative-bias threshold inversion (NBTI) ► Hot carrier injection (HCI) ► … All highly dependent on temperature and current density ► Both increasing fast!
University of Michigan Electrical Engineering and Computer Science 3 Goals of this Research Low-cost reliable system design ► How do physical wearout mechanisms progress ► How to determine that a device has failed ► How do we maintain operation given failed components
University of Michigan Electrical Engineering and Computer Science 4 Traditional and Recent Approaches Traditional detection techniques expensive ► Redundant checking structures Predictive techniques ► Canary circuits ► RAMP
University of Michigan Electrical Engineering and Computer Science 5 Proposed Technique Key Insight: ► Degradation in silicon decrease in performance ► Long incubation time followed by rapid deterioration Examples: ► TDDB: increases leakage, shifting voltage curves ► EM: increases resistance ► NBTI: shifts threshold voltage
University of Michigan Electrical Engineering and Computer Science 6 Outline Microprocessor model Wearout simulation methodology Wearout simulation results The wearout detection unit (WDU) WDU Analysis Conclusion
University of Michigan Electrical Engineering and Computer Science 7 Simulation Setup Open RISC 1200 Area1.28mm 2 Power92.2mW Clock Frequency200MHz Data Cache8KB Instruction Cache8KB
University of Michigan Electrical Engineering and Computer Science 8 Simulation Flow Step 1: Temperature and Activity Analysis Netlist Timing Synopsys VCS Activity Trace Parasitics PrimePowerHotSpot Power Trace Temperature Trace Benchmark
University of Michigan Electrical Engineering and Computer Science 9 Simulation Flow Step 2: Wearout Simulation Timing Synopsys VCS Benchmark Age Index MTTF Calculation Netlist Temperature Activity Relative Wearout Factors Signal Latency Data Wearout Simulation Device Delay = Original Delay * RWF * AI * RV ► RWF: Relative amount of wearout for a device ► AI: Performance degradation parameterized by age ► RV: Random variable
University of Michigan Electrical Engineering and Computer Science 10 Simulation Flow Step 2: Wearout Simulation
University of Michigan Electrical Engineering and Computer Science 11 Wearout Simulation Results Time (years) Signal Latency (ps) Sample Mean Latency (ps)
University of Michigan Electrical Engineering and Computer Science 12 Exploiting Performance Degradation Exponential moving average: ► EMA = α(sample – EMA previous ) + EMA previous
University of Michigan Electrical Engineering and Computer Science 13 Trend Analysis TRIX can be used to accurately track both local and long term latency trends
University of Michigan Electrical Engineering and Computer Science 14 Wearout Analysis Circuit input signal Latency Sampling TRIX l Calculation Prediction TRIX g Calculation
University of Michigan Electrical Engineering and Computer Science 15 System Integration Latency Sampling Prediction TRIX l Calculation + 0 TRIX g Calculation
University of Michigan Electrical Engineering and Computer Science 16 Dynamic Variation Temperature ► 50 o C ~4% increase in latency at 130nm Clock jitter ► Impact on latency varies ► Mean jitter typically modeled as 0 Worst-case variation would need to be sampled 12 times over 4 days
University of Michigan Electrical Engineering and Computer Science 17 WDU Implementation WDU (1 Signal)WDU (8 Signals)OR1200 Core Area (mm 2 ) Power (mW)
University of Michigan Electrical Engineering and Computer Science 18 WDU Prediction Results Each unit calibrated for a 30 year MTTF The WDU flagged at least one output from each module prior to the MTTF
University of Michigan Electrical Engineering and Computer Science 19 Lifetime Enhancement
University of Michigan Electrical Engineering and Computer Science 20 Conclusion Low-cost reliable system design ► Physical wearout mechanisms affect timing ► Failure prediction can be much cheaper than detection Wearout detection unit: ► Online timing analysis a good detector of wearout, predictor of failure ► Generic/self calibrating
University of Michigan Electrical Engineering and Computer Science 21 Simulation Results: Temperature and MTTF
University of Michigan Electrical Engineering and Computer Science 22 Technology Scaling Quickly shrinking feature sizes Sharp increase in frequency Slow decrease in supply voltage OR1200 Power Densities
University of Michigan Electrical Engineering and Computer Science 23