Online Timing Variation Tolerance for Digital Integrated Circuits Guihai Yan & Xiaowei Li State Key Laboratory of Computer Architecture, Institute of Computing.

Slides:



Advertisements
Similar presentations
Barcelona Forum on Ph.D. Research in Communications, Electronics and Signal Processing 21st October 2010 Soft Errors Hardening Techniques in Nanometer.
Advertisements

EVALUATION OF A CIRCUIT PATH DELAY TUNING TECHNIQUE FOR NANOMETER CMOS Advisor: Dr. Adit D. Singh Committee members: Dr. Vishwani D. Agrawal and Dr. Victor.
Feb. 17, 2011 Midterm overview Real life examples of built chips
ASYNC07 High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar Technion – Israel Institute.
Reliability Enhancement via Sleep Transistors Frank Sill Torres +, Claas Cornelius*, Dirk Timmermann* + Department of Electronic Engineering, Federal University.
Introduction to the TRAMS project objectives and results in Y1 Antonio Rubio, Ramon Canal UPC, Project coordinator CASTNESS’11 WORKSHOP ON TERACOMP FET.
1 Dictionary-Less Defect Diagnosis as Surrogate Single Stuck-At Faults Chidambaram Alagappan Vishwani D. Agrawal Department of Electrical and Computer.
Slide 1 Weidong Gao(Potevio) Project: IEEE P Working Group for Wireless Personal Area Networks (WPANs) Submission Title: Injection Locked Receiver.
Microprocessor Reliability
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors Fred A. Bower, Daniel J. Sorin, and Sule Ozev.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
2007 MURI Review The Effect of Voltage Fluctuations on the Single Event Transient Response of Deep Submicron Digital Circuits Matthew J. Gadlage 1,2, Ronald.
IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of.
Techniques for Multicore Thermal Management Field Cady, Bin Fu and Kai Ren.
Synchronous Digital Design Methodology and Guidelines
The Cost of Fixing Hold Time Violations in Sub-threshold Circuits Yanqing Zhang, Benton Calhoun University of Virginia Motivation and Background Power.
Dynamic Scan Clock Control In BIST Circuits Priyadharshini Shanmugasundaram Vishwani D. Agrawal
1 Delay Insensitivity does not mean slope insensitivity! Vainbaum Yuri.
Externally Tested Scan Circuit with Built-In Activity Monitor and Adaptive Test Clock Priyadharshini Shanmugasundaram Vishwani D. Agrawal.
1 Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan 1, Xiaoyao Liang 2,
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 20: Circuit Design Pitfalls Prof. Sherief Reda Division of Engineering,
Priyadharshini Shanmugasundaram Vishwani D. Agrawal DYNAMIC SCAN CLOCK CONTROL FOR TEST TIME REDUCTION MAINTAINING.
A Defect Tolerant and Performance Tunable Gate Architecture for End-of-Roadmap CMOS Adit D. Singh Electrical and Computer Engineering, Auburn University.
Architectural Power Management for High Leakage Technologies Department of Electrical and Computer Engineering Auburn University, Auburn, AL /15/2011.
University of Michigan Electrical Engineering and Computer Science 1 Online Timing Analysis for Wearout Detection Jason Blome, Shuguang Feng, Shantanu.
1 paper I design and implementation of the aegis single-chip secure processor using physical random functions, isca’05 nuno alves 28/sep/06.
By Praveen Venkataramani Committee Prof. Vishwani D. Agrawal (Advisor) Prof. Adit D. Singh Prof. Fa Foster Dai REDUCING ATE TEST TIME BY VOLTAGE AND FREQUENCY.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations ‡ Computer Science and Engineering, UC San Diego variability.org.
Case Study - SRAM & Caches
Advanced Computing and Information Systems laboratory Device Variability Impact on Logic Gate Failure Rates Erin Taylor and José Fortes Department of Electrical.
1. 2 Electronics Beyond Nano-scale CMOS Shekhar Borkar Intel Corp. July 27, 2006.
DELAY INSERTION METHOD IN CLOCK SKEW SCHEDULING BARIS TASKIN and IVAN S. KOURTEV ISPD 2005 High Performance Integrated Circuit Design Lab. Department of.
Presenter: Jyun-Yan Li Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors Pramod Subramanyan, Virendra.
UW-Madison Computer Sciences Vertical Research Group© 2010 A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design.
Determining the Optimal Process Technology for Performance- Constrained Circuits Michael Boyer & Sudeep Ghosh ECE 563: Introduction to VLSI December 5.
An Efficient Algorithm for Dual-Voltage Design Without Need for Level-Conversion SSST 2012 Mridula Allani Intel Corporation, Austin, TX (Formerly.
Variation Aware Application Scheduling in Multi-core Systems Lavanya Subramanian, Aman Kumar Carnegie Mellon University {lsubrama,
Jia Yao and Vishwani D. Agrawal Department of Electrical and Computer Engineering Auburn University Auburn, AL 36830, USA Dual-Threshold Design of Sub-Threshold.
Robust Low Power VLSI ECE 7502 S2015 Analog and Mixed Signal Test ECE 7502 Class Discussion Christopher Lukas 5 th March 2015.
SiLab presentation on Reliable Computing Combinational Logic Soft Error Analysis and Protection Ali Ahmadi May 2008.
Soft errors in adder circuits Rajaraman Ramanarayanan, Mary Jane Irwin, Vijaykrishnan Narayanan, Yuan Xie Penn State University Kerry Bernstein IBM.
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
1 A Cost-effective Substantial- impact-filter Based Method to Tolerate Voltage Emergencies Songjun Pan 1,2, Yu Hu 1, Xing Hu 1,2, and Xiaowei Li 1 1 Key.
LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches Aarul Jain, Cambridge Silicon Radio, Phoenix Aviral Shrivastava, Arizona State.
On-Chip Sensors for Process, Aging, and Temperature Variation
MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.
Outline Introduction: BTI Aging and AVS Signoff Problem
Qiang XU CUhk REliable computing laboratory (CURE)
EE 201C Modeling of VLSI Circuits and Systems Chapter 1 Introduction
Weak SRAM Cell Fault Model and a DFT Technique Mohammad Sharifkhani, with special thanks to Andrei Pavlov University of Waterloo.
Evaluating the Impact of Job Scheduling and Power Management on Processor Lifetime for Chip Multiprocessors (SIGMETRICS 2009) Authors: Ayse K. Coskun,
Hrushikesh Chavan Younggyun Cho Structural Fault Tolerance for SOC.
Patricia Gonzalez Divya Akella VLSI Class Project.
EE201C : Stochastic Modeling of FinFET LER and Circuits Optimization based on Stochastic Modeling Shaodi Wang
DEFENSE EXAMINATION GEORGIA TECH ECE P. 1 Fully Parallel Learning Neural Network Chip for Real-time Control Jin Liu Advisor: Dr. Martin Brooke Dissertation.
Gill 1 MAPLD 2005/234 Analysis and Reduction Soft Delay Errors in CMOS Circuits Balkaran Gill, Chris Papachristou, and Francis Wolff Department of Electrical.
A Novel, Highly SEU Tolerant Digital Circuit Design Approach By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.
Lecture 11: Sequential Circuit Design
Raghuraman Balasubramanian Karthikeyan Sankaralingam
SE-Aware HPC Extension : Selective Data Protection for reducing failures due to soft errors 7/20/2006 Kyoungwoo Lee.
Abbas Rahimi‡, Luca Benini†, and Rajesh Gupta‡ ‡CSE, UC San Diego
ISO New England System R&D Needs
M.S. Thesis Defense Murali Dharan Advisor: Dr. Vishwani D. Agrawal
Circuits Aging Min Chen( ) Ran Li( )
Dual Mode Logic An approach for high speed and energy efficient design
Circuits Aging Min Chen( ) Ran Li( )
Jianbo Dong, Lei Zhang, Yinhe Han, Ying Wang, and Xiaowei Li
Guihai Yan, Yinhe Han, Xiaowei Li, and Hui Liu
Guihai Yan, Yinhe Han, and Xiaowei Li
Presentation transcript:

Online Timing Variation Tolerance for Digital Integrated Circuits Guihai Yan & Xiaowei Li State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)

Sources of timing variation PVT variation Dynamic: Voltage & Temperature fluctuations Static: Process variation Aging degradation NBTI, PBTI TDDB Soft errors (in non-regular logics) SEU & SET

Process variation Sub-wavelength Lithography What you get is not what you want Systematic Random dopant fluctuations V th variation Random Max Freq. differentiate by 20% ! [Teodorescu, ISCA08] P variation is time-independent, DC component

Temperature variation Application-specific Slow-varying Milliseconds Typical thermal constant : 2ms [Donald, ISCA06] T variation is slow-varying, Low-frequency components

Voltage variation Fast-changing Inductive noise a.k.a. L(di/dt) problem IR-drop Why it is harder to keep a constant voltage level ? Example Power budget: 100W Working voltage: 1V Current: 100A To keep voltage fluctuation between ±5%, R PDN < 0.5 mOhm PDN hierarchy model V variation is fast-changing, High-frequency components

Aging degradation Aging mechanisms NBTI (PMOS) PBTI (NMOS) TDDB 20%degradation 10years Lifetime Useful time Infant mortality Aging Failure rate

Soft errors SEU (Single Event Upset) Unintentional bit-flip in storage cells SET (Single Event Transient) Transient voltage pulse propagating in combinational logics SEU SET

Outline TEA-TM Timing emergency-aware thread migration PVT variations co-optimization SVFD Stability violation based fault detection On-line fault detection via timing sensing Delay fault, aging delay, soft errors MicroFix Margin-reducing with timing sensing Application to DVFS ReviveNet Aging-delay tolerance

TEA-TM : Timing Emergency-Aware Thread Migration Focus on the essential Timing issue Not Necessarily aggregated, but can cancel off each others in some cases. Hence, Complementary. Process variation Voltage variation Temperature variation Timing variation (, )

Some terms Timing emergency (TE) Emergency level (EL) Density of TE Define: EL = # of TE per 100 millions cycles Time Delay Timing Emergency Threshold Violent Mild Slow corner Fast corner VoltageTemperatureProcess Large fluctuation Small fluctuation Hot Cool

How PVT Variations Complement each other ? Observation in time domain What if exchange the threads on Core1 and Core2? T. Mild, V. Mild Core1: Large margin, low EL T. Violent, V. Violent Core2: Little margin, High EL Time Delay Threshold Time Delay T Violent, V Violent T Mild, V Mild T Mild, V Violent T Violent, V Mild Emergency Excessive headroom Mild + Violent

Frequency domain analysis Migrate threads Graft V component

Frequency domain analysis (cont.) Relative frequency spectrum deviations on 2GHz quad-core processor. P: 0-100Hz, T: 100Hz-1MHz, V: 1MHz-250MHz. Potential Core3 and Core4 are mild Strategy exchange threads on Core1 and Core4, Core2 and Core 3

TEA-TM Summary Analyzing the complementary effect from both time and frequency domain Presenting a delay sensor- based scheme (TEA-TM) to exploit the complementary effect Simple, cost-efficient FFT-like heuristic Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan, Xiaoyao Liang, Yinhe Han, Xiaowei Li, In the Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10), Saint-Malo, France. pp , Jun Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan, Xiaoyao Liang, Yinhe Han, Xiaowei Li, In the Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10), Saint-Malo, France. pp , Jun Throughput: 30% Fairness: 80%

Stability Violation Stable Period vs. Variable Period Stability Violation: Signal transitions occur in Stable Period.

In what situations would SVs occur? Delay faults resulting from –Delay defects (introduced in manufacturing processes) –Aging (Wearout) induced performance degradation Due to Delay Fault Setup time Setup time violation TT But, Can soft error be modeled by SV? Thus, delay faults caused stability violation do not differ too much from setup time violation YES!

How do Soft Errors cause SV? SEU Si violates Stability Requirement! SET So violates Stability Requirement! Notice: NOLY the SVs occurring in vulnerable window --- within which the flip-flops are updated --- could cause failures.

The next problem is How to detect stability violations? Low cost stability checker

Some Rresults Implementation SVFD-protected FPU Using 65nm PTM, Hspice Simulation A Unified Online Fault Detection Scheme via Checking of Stability Violation Guihai Yan, Yinhe Han, Xiaowei Li, IEEE/ACM Desing, Automation and Test in Europe (DATE 09), pp , SVFD: A Versatile Online Fault Detection Scheme via Checking of Stability Violation Guihai Yan, Yinhe Han, Xiaowei Li, IEEE Transactions on Very Large Scale Integration Systems (T-VLSI), 19(9), Sep

Besides of fault detection, what else can we do with SVFD? Dynamic margin reduction MicroFix: an application to DVFS Aging tolerance ReviveNet: Fine-grained aging delay tolerance

Dynamic margin reduction Timing sensors setup

Operational Principles

Fine-grained margin exploited Generous Flip-flop (GFF) Forward Adaptable Flip-flop (FAFF) Backward Adaptable Flip-flop (BAFF)Unadaptable Flip-flop (UAFF) Localized timing imbalance

Case study results Apply to a FPU 32nm PTM models TH=0.2~0.3 is an optimal choice! Efficiency Improvement: 35% EDP MicroFix: Using Timing Interpolation and Delay Sensors for Power Reduction Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei Li, ACM Transactions on Design Automation of Electronic Systems (TODAES), 16(2), 1-21, MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, YinheHan, Hui Liu, Xiaoyao Liang, Xiaowei Li, ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED09), pp , 2009.

Localized Aging Tolerance The chance for aging adaptation We have chance to act before its too late

Nudge for timing margin Dynamic time borrowing Path-grained, NOT stage-grained

Aging sensors setup Coarse-grained detection

Trail-based adaptation Adaptation latency is non-critical Trail till success Fine-grained adaptation

Implementation False-alarm filter Sharing filters to reduce overhead ReviveNet: A Self-adaptive Architecture for Improving Lifetime Reliability via Localized Timing Adaptation Guihai Yan, Yinhe Han, Xiaowei Li, IEEE Transactions on Computers (TC), 60(9), Sep

Conclusion Dynamic timing variation is increasingly critical Online timing variation detection and tolerance is a promising approach to dynamic variation Application-specific timing variation MicroFix for DVFS ReviveNet for aging tolerance Holistic solution can be more cost-effective TEA-TM Architectural optimization for Circuit symptom

Publication ( Chronological order ) 1.Guihai Yan, Yinhe Han, Xiaowei Li, ReviveNet: A Self-adaptive Architecture for Improving Lifetime Reliability via Localized Timing Adaptation, IEEE Transactions on Computers (TC), Vol.60, No.9, pp , Sep ReviveNet: A Self-adaptive Architecture for Improving Lifetime Reliability via Localized Timing Adaptation 2.Guihai Yan, Yinhe Han, Xiaowei Li, SVFD: A Versatile Online Fault Detection Scheme via Checking of Stability Violation, IEEE Transactions on Very Large Scale Integration Systems (T-VLSI), Vol.19, No.9, pp , Sep SVFD: A Versatile Online Fault Detection Scheme via Checking of Stability Violation 3.Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei Li, MicroFix: Using Timing Interpolation and Delay Sensors for Power Reduction, ACM Transactions on Design Automation of Electronic Systems (TODAES), Vol.16, No.2, pp.1-21, Mar MicroFix: Using Timing Interpolation and Delay Sensors for Power Reduction 4.Jianbo Dong, Lei Zhang, Yinhe Han, Guihai Yan, Xiaowei Li, Performance-asymmetry-aware Scheduling for Chip Multiprocessors with Static Core Coupling, Journal of Systems Architecture, Vol.56, pp , 2010.Performance-asymmetry-aware Scheduling for Chip Multiprocessors with Static Core Coupling 5.Guihai Yan, Xiaoyao Liang, Yinhe Han, Xiaowei Li, Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors, In the Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10), Saint-Malo, France. pp , Jun Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors 6.Guihai Yan, YinheHan, Hui Liu, Xiaoyao Liang, Xiaowei Li, MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency, ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'09), pp , 2009.MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency 7.Song Jin, Yinhe Han, Lei Zhang, Huawei Li, Xiaowei Li and Guihai Yan, M-IVC: Using Multiple Input Vectors to Minimize Aging-induced Delay, Proc. of IEEE Asian Test Symposium (ATS'09), 2009.M-IVC: Using Multiple Input Vectors to Minimize Aging-induced Delay 8.Guihai Yan, Yinhe Han, Xiaowei Li, A Unified Online Fault Detection Scheme via Checking of Stability Violation, IEEE/ACM Desing, Automation and Test in Europe (DATE'09), pp , 2009.A Unified Online Fault Detection Scheme via Checking of Stability Violation 9.Guihai Yan, Yinhe Han, Xiaowei Li, Hui Liu, BAT: Performance-Driven Crosstalk Mitigation Based on Bus-grouping Asynchronous Transmission, IEICE Transactions On Electronics, Vol.E91-C, No.10, pp , Oct, 2008.BAT: Performance-Driven Crosstalk Mitigation Based on Bus-grouping Asynchronous Transmission

Book Chapters Fault Tolerance Designs for Digital Integrated Circuits: Tolerating defects/faults, parameter variations, and soft errors (in Chinese), Beijing, Science Press, ISBN Fault Tolerance Designs for Digital Integrated Circuits: Tolerating defects/faults, parameter variations, and soft errors

When Ive done a program…