CALTECH CS137 Winter2004 -- DeHon CS137: Electronic Design Automation Day 8: February 4, 2004 Fault Detection.

Slides:



Advertisements
Similar presentations
Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
Advertisements

RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
parity bit is 1: data should have an odd number of 1's
Computer Engineering II
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
EKT 221 : DIGITAL 2. Today’s Outline  Dynamic RAM (DRAM)  DRAM Cell – The Hydraulic Analogy  DRAM Block Diagram  Types of DRAM.
CHALLENGES IN EMBEDDED MEMORY DESIGN AND TEST History and Trends In Embedded System Memory.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Self-Checking Circuits
Combinational Logic and Verilog. XORs and XNORs XOR.
Quantum Error Correction SOURCES: Michele Mosca Daniel Gottesman Richard Spillman Andrew Landahl.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 25: April 16, 2007 Defect and Fault Tolerance.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 51 Lecture 5 Fault Modeling n Why model faults? n Some real defects in VLSI and PCB n Common fault.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 26: April 18, 2007 Et Cetera…
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Chapter 6 Memory and Programmable Logic Devices
What Exactly are the Techniques of Software Verification and Validation A Storehouse of Vast Knowledge on Software Testing.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 32: November 24, 2010 Uncorrelated Noise.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 30: November 12, 2014 Memory Core: Part.
Memory and Programmable Logic
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
CBSSS 2002: DeHon Architecture as Interface André DeHon Friday, June 21, 2002.
CS3502: Data and Computer Networks DATA LINK LAYER - 1.
TELSIKS 2005 Concurrent error detection in FSMs using transition checking technique G. Lj. Djordjevic, T. R. Stankovic and M. K. Stojcev Department of.
Digital Components and Combinational Circuits Sachin Kharady.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 1: September 5, 2012 Introduction and.
SiLab presentation on Reliable Computing Combinational Logic Soft Error Analysis and Protection Ali Ahmadi May 2008.
Memory Intro Computer Organization 1 Computer Science Dept Va Tech March 2006 ©2006 McQuain & Ribbens Built using D flip-flops: 4-Bit Register Clock input.
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 251 Introduction to Computer Organization.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 13: February 20, 2002 Routing 1.
Implicit-Storing and Redundant- Encoding-of-Attribute Information in Error-Correction-Codes Yiannakis Sazeides 1, Emre Ozer 2, Danny Kershaw 3, Panagiota.
Floyd, Digital Fundamentals, 10 th ed Digital Fundamentals Tenth Edition Floyd © 2008 Pearson Education Chapter 1.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /3/2013 Lecture 9: Memory Unit Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 17: May 9, 2005 Defect and Fault Tolerance.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 6: January 26, 2004 Sequential Optimization (FSM Encoding)
Digital Circuits Introduction Memory information storage a collection of cells store binary information RAM – Random-Access Memory read operation.
A4 1 Barto "Sequential Circuit Design for Space-borne and Critical Electronics" Dr. Rod L. Barto Spacecraft Digital Electronics Richard B. Katz NASA Goddard.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 16: May 31, 2001 Defect and Fault Tolerance.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 17: March 11, 2002 Sequential Optimization (FSM Encoding)
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 4: September 12, 2012 Transistor Introduction.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 5: September 8, 2014 Transistor Introduction.
Gunjeet Kaur Dronacharya Group of Institutions. Outline I Random-Access Memory Memory Decoding Error Detection and Correction Read-Only Memory Programmable.
CALTECH CS137 Fall DeHon CS137: Electronic Design Automation Day 9: October 17, 2005 Fault Detection.
CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 8: January 27, 2006 Cellular Placement.
Memory and Programmable Logic
Self-Checking Circuits
Vladimir Stojanovic & Nicholas Weaver
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Computer Architecture & Operations I
Lecture 14 - Dynamic Memory
RAID Redundant Array of Inexpensive (Independent) Disks
Software Verification and Validation
Chapter 4: MEMORY.
Software Verification and Validation
Software Verification and Validation
4-Bit Register Built using D flip-flops:
Seminar on Enterprise Software
Presentation transcript:

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 8: February 4, 2004 Fault Detection

CALTECH CS137 Winter DeHon Today Faults in Logic Error Detection Schemes Optimization Problem

CALTECH CS137 Winter DeHon Problem Gates, wires, memories: – built out of physical media –may fail

CALTECH CS137 Winter DeHon Device Physics Represent a 1 or 0 with charge –On a gate, in a memory Charge may be disrupted –  -particle –Ground bounce –Noise coupling –Tunneling –Thermal noise –Behavior of individual electrons is statistical

CALTECH CS137 Winter DeHon DRAMs Small cells Store charge dynamically on capacitor Store about 50,000 electrons Must be refreshed –Data leaks away through parasitic resistance  -particle can be 1,000,000 carriers?

CALTECH CS137 Winter DeHon System Reliability Device fail with Probability: P fail Have N components in system All must work for device to work P sys = (1-P fail ) N

CALTECH CS137 Winter DeHon System Reliability If N  P fail << 1  N  P fail dominates higher order terms…

CALTECH CS137 Winter DeHon System Reliability P sysfail  N  P fail

CALTECH CS137 Winter DeHon Modern System 100 Million  1 Billion Transistors –Not to mention wiring… > GHz = > 1 Billion Transitions / sec. N = per second…

CALTECH CS137 Winter DeHon As we scale? N increases Charge/gate decreases –Less electrons –Higher probability they wander –Greater variability in behavior Voltage levels decrease –Smaller barriers Greater variability in device parameters  P fail increases

CALTECH CS137 Winter DeHon Exacerbated at Nanoscale Small numbers of dopants (10s) –High variability Small numbers of electrons ( s?) –High variability –Highly susceptible to noise Small number of molecules –May break, decay…

CALTECH CS137 Winter DeHon What do we do about it? Tolerate faulty components Detect faults –Not do anything bad –Try it again If statistically unlikely error, –high likelihood won’t recur. …Focus on detection…

CALTECH CS137 Winter DeHon Detect Faults Key Idea: redundancy Include enough redundancy in computation –Can tell that an error occurred

CALTECH CS137 Winter DeHon What kind of redundancy can we use? Multiple copies of logic Compute something about result –Parity on number of outputs –Count of number of 1’s in output

CALTECH CS137 Winter DeHon Error Detection

CALTECH CS137 Winter DeHon What do we protect against? Any n errors –Worst-case selection of errors

CALTECH CS137 Winter DeHon Single Error Detection If P fail small: –No error: (1-P fail ) N  1-N  P fail –One error: N  P fail  (1-P fail ) N-1  N  P fail –Two errors : [N  (N-1)/2]  (P fail ) 2  (1-P fail ) N-1 Probability of an error going undetected  Goes from  N  P fail  to  (N  P fail ) 2  For: N  P fail << 1

CALTECH CS137 Winter DeHon Detection Overhead Correction and detection circuitry increase circuit size. N detect > N logic N detect = c N logic Probability of an error going undetected  Goes from  N  P fail  to  (c  N  P fail ) 2  Want: c 2 << 1/(N  P fail )

CALTECH CS137 Winter DeHon Reliability Tuning Want N  P fail small –Want: (c  N  P fail ) 2 very small Idea: –Guard subsystems independently –Make N sub suitably small –Smaller probability there is a double error localized in this small subsystem

CALTECH CS137 Winter DeHon Guarding Subsystems

CALTECH CS137 Winter DeHon Composing Subsystems P sysundetect = (N sys /N s ) P subundetect P subundetect = (c  N s  P fail ) 2 P sysundetect = (N sys /N s ) (c  N s  P fail ) 2 P sysundetect = N sys  N s  (c  P fail ) 2 Extermes: N s = N sys N s =1

CALTECH CS137 Winter DeHon Problem Generate logic capable of detecting any single error

CALTECH CS137 Winter DeHon Terminology Fault-secure: system never produces incorrect code word –Either produces correct result –Or detects the error Self-testing: for every fault, there is some input that produces an incorrect code word –That detects the error

CALTECH CS137 Winter DeHon Terminology Totally Self Checking: system is both fault-secure and self-testing.

CALTECH CS137 Winter DeHon Duplication

CALTECH CS137 Winter DeHon Duplication N original gates Duplicate: + N O outputs –O xors –O/2  2  2 ors O<N 2<c<5

CALTECH CS137 Winter DeHon Duplication with PLA Logic Duplicate

CALTECH CS137 Winter DeHon PLA Duplication N product terms in original N in duplicate 2 O product terms for matching O<=N 2<c<4

CALTECH CS137 Winter DeHon Can we do better? Seems like overkill to compute twice?

CALTECH CS137 Winter DeHon Idea Encode so outputs have some checkable property –E.g. parity

CALTECH CS137 Winter DeHon Will this work? Original Logic Extra cubes for parity parity

CALTECH CS137 Winter DeHon Problem Single fault may produce multiple output errors

CALTECH CS137 Winter DeHon How Fix? How do we fix?

CALTECH CS137 Winter DeHon No Logic Sharing No sharing Single fault effects single output

CALTECH CS137 Winter DeHon Parity Checking To check parity –Need xor tree on outputs/parity –[(O+1)/2]  2  2 = 2(O+1) xors For PLA –xor would blow up –Wrap multiple times –2 product terms per xor –4  O product terms

CALTECH CS137 Winter DeHon nanoPLA Wrapped xor Note: two planes here just for buffering/inversion

CALTECH CS137 Winter DeHon Better or Worse than Dual? Depends on sharing in logic Typical results from Mitra [ITC2002]

CALTECH CS137 Winter DeHon Can we allow sharing? When?

CALTECH CS137 Winter DeHon Multiple Parity Groups Can share with different parity groups Common error flagged in both groups

CALTECH CS137 Winter DeHon Better or Worse than Dual? Typical results from Mitra [ITC2002] (parity here includes sharing)

CALTECH CS137 Winter DeHon Project Assignment Assignments #3 & #4 –Out on Monday Provide an algorithm for identifying parity groups –Keep single error detection property –Minimize pterms

CALTECH CS137 Winter DeHon Admin Assignment #2 due Friday

CALTECH CS137 Winter DeHon Big Ideas Low-level physics imperfect –Statistical, noisy Larger devices  greater likelihood of faults Redundancy Self-checking circuits