UW-Madison Computer Sciences Vertical Research Group© 2010 A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design.

Slides:



Advertisements
Similar presentations
Thank you for your introduction.
Advertisements

International Symposium on Low Power Electronics and Design Energy-Efficient Non-Minimal Path On-chip Interconnection Network for Heterogeneous Systems.
DESIGN AND EVALUATION OF HYBRID FAULT-DETECTION SYSTEMS Qing Xu Kevin Wang.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Slides based on Kewal Saluja
Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical.
Microprocessor Reliability
VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects Sarangi et al Prateeksha Satyamoorthy CS
The Once and Future Computer: Post-Moore Mechanical Circuits Matthew Spencer BEARS Symposium Lightning Talk
Class presentation based on ISSCC : A Low-power 1GHz Razor FIR Accelerator with Time-Borrow Tracking Pipeline and Approximate Error Correction.
This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/ ] under.
UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.
A Cyber-Physical Systems Approach to Energy Management in Data Centers Presented by Chen He Adopted form the paper authors.
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
The Impact of Variability on the Reliability of Long on-chip Interconnect in the Presence of Crosstalk Basel Halak, Santosh Shedabale, Hiran Ramakrishnan,
Extending the Unified Parallel Processing Speedup Model Computer architectures take advantage of low-level parallelism: multiple pipelines The next generations.
Fall 06, Sep 19, 21 ELEC / Lecture 6 1 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic.
1 Runnemede: Disruptive Technologies for UHPC John Gustafson Intel Labs HPC User Forum – Houston 2011.
1 A Variation-tolerant Sub- threshold Design Approach Nikhil Jayakumar Sunil P. Khatri. Texas A&M University, College Station, TX.
Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.
1 Razor: A Low Power Processor Design Presented By: - Murali Dharan.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.
Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors Vishwani D. Agrawal.
Checkpoint Based Recovery from Power Failures Christopher Sutardja Emil Stefanov.
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
Circuit-Level Timing Speculation: The Razor Latch Developed by Trevor Mudge’s group at the University of Michigan, 2003.
Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations ‡ Computer Science and Engineering, UC San Diego variability.org.
Shuchang Shan † ‡, Yu Hu †, Xiaowei Li † † Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences.
Advanced Computing and Information Systems laboratory Device Variability Impact on Logic Gate Failure Rates Erin Taylor and José Fortes Department of Electrical.
Presenter: Jyun-Yan Li Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors Pramod Subramanyan, Virendra.
A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.
Determining the Optimal Process Technology for Performance- Constrained Circuits Michael Boyer & Sudeep Ghosh ECE 563: Introduction to VLSI December 5.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Architectural Optimizations Ed Carlisle. DARA: A LOW-COST RELIABLE ARCHITECTURE BASED ON UNHARDENED DEVICES AND ITS CASE STUDY OF RADIATION STRESS TEST.
1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
(C) 2003 Daniel SorinDuke Architecture Dynamic Verification of End-to-End Multiprocessor Invariants Daniel J. Sorin 1, Mark D. Hill 2, David A. Wood 2.
1 A Cost-effective Substantial- impact-filter Based Method to Tolerate Voltage Emergencies Songjun Pan 1,2, Yu Hu 1, Xing Hu 1,2, and Xiaowei Li 1 1 Key.
Comparative Analysis of Ultra-Low Voltage Flip-Flops for Energy Efficiency Bo Fu and Paul Ampadu IEEE International Symposium on Circuits and Systems,pp ,
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2004 Daniel J. Sorin Duke University.
Idempotent Processor Architecture Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group UW-Madison MICRO 2011, Porto Alegre.
Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications.
Leakage reduction techniques Three major leakage current components 1. Gate leakage ; ~ Vdd 4 2. Subthreshold ; ~ Vdd 3 3. P/N junction.
Classical Control for Quantum Computers Mark Whitney, Nemanja Isailovic, Yatish Patel, John Kubiatowicz U.C. Berkeley.
Distributed Computation: Circuit Simulation CK Cheng UC San Diego
Variation-Tolerant Circuits: Circuit Solutions and Techniques Jim Tschanz, Keith Bowman, and Vivek De Microprocessor Technology Lab Intel Corporation,
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Adaptive Online Testing.
Hrushikesh Chavan Younggyun Cho Structural Fault Tolerance for SOC.
Patricia Gonzalez Divya Akella VLSI Class Project.
Oct 31 st 2007University of Utah1 Multi-Cores: Architecture/VLSI Perspective The Hardware-Software Relationship: Date or Dump?
11/15/05ELEC / Lecture 191 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
Dynamic Verification of Sequential Consistency Albert Meixner Daniel J. Sorin Dept. of Computer Dept. of Electrical and Science Computer Engineering Duke.
Raghuraman Balasubramanian Karthikeyan Sankaralingam Understanding the Impact of Gate-Level Physical Reliability Effects on Whole Program Execution.
Characterizing Processors for Energy and Performance Management Harshit Goyal and Vishwani D. Agrawal Department of Electrical and Computer Engineering,
University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.
Yuxi Liu The Chinese University of Hong Kong Circuit Timing Problem Driven Optimization.
Raghuraman Balasubramanian Karthikeyan Sankaralingam
Supervised Learning Based Model for Predicting Variability-Induced Timing Errors Xun Jiao, Abbas Rahimi, Balakrishnan Narayanaswamy, Hamed Fatemi, Jose.
SIMD Lane Decoupling Improved Timing-Error Resilience
Supporting Fault-Tolerance in Streaming Grid Applications
Circuit Design Techniques for Low Power DSPs
Co-designed Virtual Machines for Reliable Computer Systems
Dynamic Verification of Sequential Consistency
Research Topics Embedded, Real-time, Sensor Systems Frank Mueller moss
Presentation transcript:

UW-Madison Computer Sciences Vertical Research Group© 2010 A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design Style, and Fault Recovery Mechanism Marc de Kruijf Shuou Nomura Karu Sankaralingam

DSN From Hard to Harder 45nm & beyond 90nm 180nm 360nm 720nm 1500um 4000um 10000nm HardHarder

DSN What is the Problem?  Non-ideal transistor scaling  Transistor wear-out  Process, voltage, and temperature (PVT) variations  Errors due to particle interference  Noise coupling & crosstalk

DSN What is the Problem? DMR Timing speculation RMT HW checkpoints TMR ECC Watchdog Dynamic verification Multi-core Coherence & consistency On-chip network Out-of-order Branch prediction Performance ToolboxReliability Toolbox NEED HIGH-LEVEL ANALYSIS TOOLS

DSN Our Contribution Also…. Q.What is the impact of technology scaling? A.Further benefits are small to none. Q. What is the impact of CMOS design style? A.Very low power designs benefit most. Q.What is the impact of the fault recovery mechanism? A.Fine-grained recovery is key to high efficiencies. A model for timing speculation Unifies hardware + system Small set of high-level inputs processor designer

DSN Outline  Timing Speculation  Model Overview  Hardware Efficiency Model  System Recovery Model  Results  Conclusion

DSN Timing Speculation … clock circuit delay clock period( = 1/frequency ) Timing failure! variations OK! slower clock detect & recover …

DSN Outline  Timing Speculation  Model Overview  Hardware Efficiency Model  System Recovery Model  Results  Conclusion

DSN Model Overview Error rate Time Hardware Efficiency System RecoveryOverall Efficiency Energy Model Inputs 1. A hardware path delay distribution 2. Effect of variations on path delay as N(μ,σ) 3. The time between recovery checkpoints 4.The time to restore a checkpoint

DSN Hardware Efficiency Model # Paths Path delay Error prob. Clock period Error rate Energy Input 1: Path delay distribution Input 2: Path delay variation (σ) Error prob. Clock period Error prob. Energy Error prob. … … e.g. frequency scaling

DSN System Recovery Model System Recovery Model Inputs 1. The time between recovery checkpoints (cycles) 2.The time to restore a checkpoint (restore) overhead(rate) =failures(rate) xwaste(rate)+ restore ( ) Error rate Time (applies to all backward error recovery systems)

DSN Outline  Timing Speculation  Model Overview  Hardware Efficiency Model  System Recovery Model  Results  Conclusion

DSN Results High Performance CMOS Low Power CMOS Ultra-low Power CMOS Razor Reunion Paceline 11nm 45nm Is the model useful? What can we learn? CMOS Design Style Technology Node Recovery System

DSN Results Error rate Time Hardware Efficiency System RecoveryOverall Efficiency Energy

DSN Hardware Model Inputs 1.Path delay distribution  Application: H.264 decoding  Hardware: OpenRISC processor 2.Effect of process variations as N(μ,σ) using ITRS data  High Performance CMOS  45nm σ = 0.046μ  11nm σ = 0.051μ  Low Power CMOS  45nm σ = 0.029μ  11nm σ = 0.042μ  Ultra-low Power CMOS  45nm σ = 0.196μ

DSN Hardware Efficiency Error rate Energy Results for High Performance CMOS EDP Energy = Power x Time EDP = Power x Time 2 Normalized EDP Error rate

DSN Recovery Model Inputs 1.The time between recovery checkpoints & 2.The time to restore a checkpoint  Razor  Latch-level detection + pipeline rollback  1 cycle checkpoint size & 5 cycle recovery cost  Reunion  DMR detection + checkpoint  100 cycle checkpoint size & 100 cycle recovery cost  Paceline  DMR detection + checkpoint + flush  100 cycle checkpoint size & 1000 cycle recovery cost

DSN System Recovery Error rate Time Normalized Time Error rate

DSN Overall Efficiency Error rate EDP 1. High Performance CMOS 2. Low Power CMOS 3. Ultra-low Power CMOS

DSN Normalized EDP Overall Efficiency High Performance CMOS 2 3 % P E A K, % T Y P I C A L Error rate

DSN Overall Efficiency Low Power CMOS 1 8 % P E A K, % T Y P I C A L Error rate Normalized EDP

DSN Normalized EDP Overall Efficiency Ultra-low Power CMOS 4 7 % P E A K, % T Y P I C A L Error rate

DSN Outline  Timing Speculation  Model Overview  Hardware Efficiency Model  System Recovery Model  Results  Conclusion

DSN Conclusions  A High-level Model  Results  Efficiency gains improve only minimally with scaling  Ultra-low power (sub-threshold) CMOS benefits most  Fine-grained recovery is key  Future Work  Incorporate more sources of variation  A tool for processor designers?  Under development at

DSN Timing speculation Multi-core Coherence & consistency On-chip network Out-of-order Branch prediction Questions?

DSN ‹#› ?

DSN Timing Speculation Manufacturing Process RuntimeApplication Source of Timing Variation Speed Binning Online Timing Analysis Timing Speculation Figure adapted from Greskamp et al., Paceline: [...]. In PACT ’07.

DSN expected # cycles executed upon failure System Recovery Model System Recovery Model Inputs 1. The time between recovery checkpoints (cycles) 2.The time to restore a checkpoint (restore) expected # failures before success

DSN Overall Inputs 1.Path delay distribution  Application: H.264 decoding  Hardware: OpenRISC processor 2.Effect of process variations on path delay as N(μ,σ) using ITRS data  High Performance = 0.046μ  Low Power = 0.029μ  Ultra-low Power = 0.196μ 3.The time between recovery checkpoints & 4.The time to restore a checkpoint  Razor – Latch-level detection + pipeline rollback(1 & 5 cycles)  Reunion – DMR detection + checkpoint(100 & 100 cycles)  Paceline – DMR detection + checkpoint + flush(100 & 1000 cycles)