UW-Madison Computer Sciences Vertical Research Group© 2010 A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design Style, and Fault Recovery Mechanism Marc de Kruijf Shuou Nomura Karu Sankaralingam
DSN From Hard to Harder 45nm & beyond 90nm 180nm 360nm 720nm 1500um 4000um 10000nm HardHarder
DSN What is the Problem? Non-ideal transistor scaling Transistor wear-out Process, voltage, and temperature (PVT) variations Errors due to particle interference Noise coupling & crosstalk
DSN What is the Problem? DMR Timing speculation RMT HW checkpoints TMR ECC Watchdog Dynamic verification Multi-core Coherence & consistency On-chip network Out-of-order Branch prediction Performance ToolboxReliability Toolbox NEED HIGH-LEVEL ANALYSIS TOOLS
DSN Our Contribution Also…. Q.What is the impact of technology scaling? A.Further benefits are small to none. Q. What is the impact of CMOS design style? A.Very low power designs benefit most. Q.What is the impact of the fault recovery mechanism? A.Fine-grained recovery is key to high efficiencies. A model for timing speculation Unifies hardware + system Small set of high-level inputs processor designer
DSN Outline Timing Speculation Model Overview Hardware Efficiency Model System Recovery Model Results Conclusion
DSN Timing Speculation … clock circuit delay clock period( = 1/frequency ) Timing failure! variations OK! slower clock detect & recover …
DSN Outline Timing Speculation Model Overview Hardware Efficiency Model System Recovery Model Results Conclusion
DSN Model Overview Error rate Time Hardware Efficiency System RecoveryOverall Efficiency Energy Model Inputs 1. A hardware path delay distribution 2. Effect of variations on path delay as N(μ,σ) 3. The time between recovery checkpoints 4.The time to restore a checkpoint
DSN Hardware Efficiency Model # Paths Path delay Error prob. Clock period Error rate Energy Input 1: Path delay distribution Input 2: Path delay variation (σ) Error prob. Clock period Error prob. Energy Error prob. … … e.g. frequency scaling
DSN System Recovery Model System Recovery Model Inputs 1. The time between recovery checkpoints (cycles) 2.The time to restore a checkpoint (restore) overhead(rate) =failures(rate) xwaste(rate)+ restore ( ) Error rate Time (applies to all backward error recovery systems)
DSN Outline Timing Speculation Model Overview Hardware Efficiency Model System Recovery Model Results Conclusion
DSN Results High Performance CMOS Low Power CMOS Ultra-low Power CMOS Razor Reunion Paceline 11nm 45nm Is the model useful? What can we learn? CMOS Design Style Technology Node Recovery System
DSN Results Error rate Time Hardware Efficiency System RecoveryOverall Efficiency Energy
DSN Hardware Model Inputs 1.Path delay distribution Application: H.264 decoding Hardware: OpenRISC processor 2.Effect of process variations as N(μ,σ) using ITRS data High Performance CMOS 45nm σ = 0.046μ 11nm σ = 0.051μ Low Power CMOS 45nm σ = 0.029μ 11nm σ = 0.042μ Ultra-low Power CMOS 45nm σ = 0.196μ
DSN Hardware Efficiency Error rate Energy Results for High Performance CMOS EDP Energy = Power x Time EDP = Power x Time 2 Normalized EDP Error rate
DSN Recovery Model Inputs 1.The time between recovery checkpoints & 2.The time to restore a checkpoint Razor Latch-level detection + pipeline rollback 1 cycle checkpoint size & 5 cycle recovery cost Reunion DMR detection + checkpoint 100 cycle checkpoint size & 100 cycle recovery cost Paceline DMR detection + checkpoint + flush 100 cycle checkpoint size & 1000 cycle recovery cost
DSN System Recovery Error rate Time Normalized Time Error rate
DSN Overall Efficiency Error rate EDP 1. High Performance CMOS 2. Low Power CMOS 3. Ultra-low Power CMOS
DSN Normalized EDP Overall Efficiency High Performance CMOS 2 3 % P E A K, % T Y P I C A L Error rate
DSN Overall Efficiency Low Power CMOS 1 8 % P E A K, % T Y P I C A L Error rate Normalized EDP
DSN Normalized EDP Overall Efficiency Ultra-low Power CMOS 4 7 % P E A K, % T Y P I C A L Error rate
DSN Outline Timing Speculation Model Overview Hardware Efficiency Model System Recovery Model Results Conclusion
DSN Conclusions A High-level Model Results Efficiency gains improve only minimally with scaling Ultra-low power (sub-threshold) CMOS benefits most Fine-grained recovery is key Future Work Incorporate more sources of variation A tool for processor designers? Under development at
DSN Timing speculation Multi-core Coherence & consistency On-chip network Out-of-order Branch prediction Questions?
DSN ‹#› ?
DSN Timing Speculation Manufacturing Process RuntimeApplication Source of Timing Variation Speed Binning Online Timing Analysis Timing Speculation Figure adapted from Greskamp et al., Paceline: [...]. In PACT ’07.
DSN expected # cycles executed upon failure System Recovery Model System Recovery Model Inputs 1. The time between recovery checkpoints (cycles) 2.The time to restore a checkpoint (restore) expected # failures before success
DSN Overall Inputs 1.Path delay distribution Application: H.264 decoding Hardware: OpenRISC processor 2.Effect of process variations on path delay as N(μ,σ) using ITRS data High Performance = 0.046μ Low Power = 0.029μ Ultra-low Power = 0.196μ 3.The time between recovery checkpoints & 4.The time to restore a checkpoint Razor – Latch-level detection + pipeline rollback(1 & 5 cycles) Reunion – DMR detection + checkpoint(100 & 100 cycles) Paceline – DMR detection + checkpoint + flush(100 & 1000 cycles)