Guihai Yan, Yinhe Han, and Xiaowei Li

Slides:



Advertisements
Similar presentations
Online Timing Variation Tolerance for Digital Integrated Circuits Guihai Yan & Xiaowei Li State Key Laboratory of Computer Architecture, Institute of Computing.
Advertisements

Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
CS370 – Spring 2003 Hazards/Glitches. Time Response in Combinational Networks Gate Delays and Timing Waveforms Hazards/Glitches and How To Avoid Them.
Microprocessor Reliability
Computer Architecture CS 215
Digital Logic Chapter 5 Presented by Prof Tim Johnson
Digital Logic Design Lecture # 17 University of Tehran.
IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of.
Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök.
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Dr. Shi Dept. of Electrical and Computer Engineering.
ENGIN112 L28: Timing Analysis November 7, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 28 Timing Analysis.
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 6 –Selected Design Topics Part 3 – Asynchronous.
Embedded Systems Hardware:
1 Advanced Digital Design Asynchronous Design: Research Concept by A. Steininger and M. Delvai Vienna University of Technology.
مرتضي صاحب الزماني  The registers are master-slave flip-flops (a.k.a. edge-triggered) –At the beginning of each cycle, propagate values from primary inputs.
CS 151 Digital Systems Design Lecture 28 Timing Analysis.
University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded.
Dynamic Test Set Selection Using Implication-Based On-Chip Diagnosis Nuno Alves, Yiwen Shi, Nicholas Imbriglia, and Iris Bahar Brown University Jennifer.
SiLab presentation on Reliable Computing Combinational Logic Soft Error Analysis and Protection Ali Ahmadi May 2008.
Soft errors in adder circuits Rajaraman Ramanarayanan, Mary Jane Irwin, Vijaykrishnan Narayanan, Yuan Xie Penn State University Kerry Bernstein IBM.
Computer Engineering Group Brandenburg University of Technology at Cottbus 1 Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in VLIW-Processors.
A Robust Pulse-triggered Flip-Flop and Enhanced Scan Cell Design
On-Chip Reliability Monitor for Measuring Frequency Degradation of Digital Circuits Department of Electrical and Computer Engineering By Han Lin Jiun-Yi.
Synthesis Of Fault Tolerant Circuits For FSMs & RAMs Rajiv Garg Pradish Mathews Darren Zacher.
Yun-Chung Yang SimTag: Exploiting Tag Bits Similarity to Improve the Reliability of the Data Caches Jesung Kim, Soontae Kim, Yebin Lee 2010 DATE(The Design,
1 A Cost-effective Substantial- impact-filter Based Method to Tolerate Voltage Emergencies Songjun Pan 1,2, Yu Hu 1, Xing Hu 1,2, and Xiaowei Li 1 1 Key.
MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.
Detecting Errors Using Multi-Cycle Invariance Information Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence,
Hrushikesh Chavan Younggyun Cho Structural Fault Tolerance for SOC.
Deterministic Diagnostic Pattern Generation (DDPG) for Compound Defects Fei Wang 1,2, Yu Hu 1, Huawei Li 1, Xiaowei Li 1, Jing Ye 1,2 1 Key Laboratory.
Gill 1 MAPLD 2005/234 Analysis and Reduction Soft Delay Errors in CMOS Circuits Balkaran Gill, Chris Papachristou, and Francis Wolff Department of Electrical.
Winter Semester 2010 ”Politehnica” University of Timisoara Course No. 5: Expanding Bio-Inspiration: Towards Reliable MuxTree  Memory Arrays – Part 2 –
MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.
Computer Science 210 Computer Organization
Digital Integrated Circuits A Design Perspective
Lecture 11: Sequential Circuit Design
Digital Integrated Circuits A Design Perspective
Overview Part 1 – The Design Space
Computer Organization and Architecture + Networks
Raghuraman Balasubramanian Karthikeyan Sankaralingam
Computer Organization and Design Memories and State Machines
SHORT CIRCUIT MONITORING BY USING PLC & SCADA
Sequential circuit design with metastability
Digital Fundamentals Floyd Chapter 7 Tenth Edition
SLIDES FOR CHAPTER 13 ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Computer Science 210 Computer Organization
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
CPE/EE 422/522 Advanced Logic Design L03
Jeremy R. Johnson Mon. Apr. 3, 2000
Computer Science 210 Computer Organization
CLOCK DOMAIN AND OPERATING CONDITIONS
Introduction to Static Timing Analysis:
Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced.
ECE434a Advanced Digital Systems L06
Dynamic Prediction of Architectural Vulnerability
Dynamic Prediction of Architectural Vulnerability
Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.
Analytical Approach for Soft Error Rate Estimation of SRAM-Based FPGAs
Guihai Yan, Yinhe Han, Xiaowei Li, and Hui Liu
R.W. Mann and N. George ECE632 Dec. 2, 2008
2/23/2019 A Practical Approach for Handling Soft Errors in Iterative Applications Jiaqi Liu and Gagan Agrawal Department of Computer Science and Engineering.
Thought of the Day To be what we are, and to become
Hardware Assisted Fault Tolerance Using Reconfigurable Logic
The Impact of Aging on FPGAs
ECE 352 Digital System Fundamentals
Off-path Leakage Power Aware Routing for SRAM-based FPGAs
Binary Adder/Subtractor
COMP541 Sequential Logic Timing
Lecture 3: Timing & Sequential Circuits
Presentation transcript:

Guihai Yan, Yinhe Han, and Xiaowei Li A Unified Online Fault Detection Scheme via Checking of Stability Violation Guihai Yan, Yinhe Han, and Xiaowei Li Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences Apr. 22, 2009

Outline Introduction What’s Stability Violation Fault Detection via Checking Stability Violation Design Considerations Hspice Simulation Results Conclusion

Introduction Two in-field reliability challenges Soft errors SET, SEU Detection scheme: Redundancy (either temporal or spatial, or both) Aging failures Induced by NBTI, TDDB, etc. Detection scheme: Using aging sensor Can one fault model handle all of the above in-field faults? Since a unified detection scheme is possible only under a unified fault model!

What’s “Stability Violation” Stable Period vs. Variable Period Stability Violation: Signal transitions occur in Stable Period.

In what situations would a SV occur? When encounter delay faults resulting from Delay defects (introduced in manufacturing processes) Aging (Wearout) induced performance degradation Setup time Setup time violation Due to Delay Fault T T Thus, delay faults caused stability violation does not differ too much from “setup time violation” But, can soft errors be modeled by SV? YES!

Soft Errors can also cause SVs SEU (Single Event Upset) Unintentional bit-flip in storage cells SET (Single Event Transient) Transient voltage pulse propagating in combinational logics SEU SET

How Soft Errors cause SV Si violates Stability Requirement! SEU SET So violates Stability Requirement! Notice: NOLY the SVs occurring in “vulnerable window” --- within which the flip-flops are updated --- could cause failures.

Delay faults and soft errors can be modeled as Now we conclude that… Delay faults and soft errors can be modeled as Stability Violations. The next problem is How to detect stability violations? Using Stability Checker

Stability Checker Basic operating principle Step1: Precharge S1 and S2 to “HIGH” Step2: Monitor state (evaluation) No stability violation S1 OR S2 = 1 Otherwise S1 OR S2 = 0 NOR Because during precharge checker is unable to monitor any signal, so when to precharge is an essential design consideration!

Objective of Manipulating Precharge (or Evaluation) (1) Distinguish faulty transitions that cause SVs from normal signal transitions (2) Keep the vulnerable window under monitor (evaluation) (3) The evaluation period should be larger than the width of SET  Eval. SET Update OR At the end? At the beginning?

 Precharge at the end? Evaluation Precharge NOT GOOD! Even a setup violation would escape unpunished!

  At the beginning? Likely to catch normal tran---False Alarm ! What if a normal tran. happen here, far from setup requirement?  Si So Likely to catch normal tran---False Alarm ! Comb. Precharge Evaluation What if a SEU occurs here and corresponding “So” 1) is masked (logic or latch window)  2) cause SV--- Propagation Detectable , or 3) is stabilized before the start of Eval.  Precharge Mask Eval. Still NOT good!  Precharge Eval.

At the Beginning (2) Precharge PDP. Eval. Still Open (XOR Protection) Benign Period And precharge can be scheduled here Propagation Detectable Period

A Comprehensive Solution  tpd: propagation delay of the combinational logic tcd: contamination delay (a.k.a. short-path delay) tcq: flip-flop’s clock-to-q time TGB: “conservative” setup time requirement TDS: expected maximum width of SET

Experiments Using 65nm PTM Hspice Simulation Overhead Analysis Area Power Performance Design complexity

Simulation Signal States Guard Band Detection Slack CLK CLKS Normal Transions XOR So Fault Transions Aging delay SEU fault Voltage S1 S2 A1 B1 Fault detected X Fault detected Fault detected Time

Thank You! Conclusion A Unified Fault Model ---Stability Violation--- can facilitate implementing A Unified Fault Detection Scheme Thank You!