Oct. 2006 Error Correction Slide 1 Fault-Tolerant Computing Dealing with Mid-Level Impairments.

Slides:



Advertisements
Similar presentations
Cyclic Code.
Advertisements

Error Control Code.
10.1 Chapter 10 Error Detection and Correction Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
15-853:Algorithms in the Real World
Information and Coding Theory
NETWORKING CONCEPTS. ERROR DETECTION Error occures when a bit is altered between transmission& reception ie. Binary 1 is transmitted but received is binary.
Oct Combinational Modeling Slide 1 Fault-Tolerant Computing Motivation, Background, and Tools.
Fundamentals of Computer Networks ECE 478/578 Lecture #4: Error Detection and Correction Instructor: Loukas Lazos Dept of Electrical and Computer Engineering.
Lecture 9-10: Error Detection and Correction Anders Västberg Slides are a selection from the slides from chapter 8 from:
DIGITAL COMMUNICATION Coding
Chapter 11 Algebraic Coding Theory. Single Error Detection M = (1, 1, …, 1) is the m  1 parity check matrix for single error detection. If c = (0, 1,
Feb. 2015Part IV – Errors: Informational DistortionsSlide 1.
S. Mandayam/ ECOMMS/ECE Dept./Rowan University Electrical Communications Systems ECE Spring 2007 Shreekanth Mandayam ECE Department Rowan University.
Oct Defect Avoidance and Circumvention Slide 1 Fault-Tolerant Computing Dealing with Low-Level Impairments.
7/2/2015Errors1 Transmission errors are a way of life. In the digital world an error means that a bit value is flipped. An error can be isolated to a single.
Department of Electrical and Computer Engineering
VIT UNIVERSITY1 ECE 103 DIGITAL LOGIC DESIGN CHAPTER I NUMBER SYSTEMS AND CODES Reference: M. Morris Mano & Michael D. Ciletti, "Digital Design", Fourth.
15-853Page :Algorithms in the Real World Error Correcting Codes I – Overview – Hamming Codes – Linear Codes.
Error Detection and Correction
Error Detection and Reliable Transmission EECS 122: Lecture 24 Department of Electrical Engineering and Computer Sciences University of California Berkeley.
Hamming Code Rachel Ah Chuen. Basic concepts Networks must be able to transfer data from one device to another with complete accuracy. Data can be corrupted.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 PART III: DATA LINK LAYER ERROR DETECTION AND CORRECTION 7.1 Chapter 10.
1 S Advanced Digital Communication (4 cr) Cyclic Codes.
Part.7.1 Copyright 2007 Koren & Krishna, Morgan-Kaufman FAULT TOLERANT SYSTEMS Part 7 - Coding.
Oct. 2007Error DetectionSlide 1 Fault-Tolerant Computing Dealing with Mid-Level Impairments.
1 Forward Error Correction in Sensor Networks Jaein Jeong, Cheng-Tien Ee University of California, Berkeley.
British Computer Society
Cyclic Codes for Error Detection W. W. Peterson and D. T. Brown by Maheshwar R Geereddy.
Error Coding Transmission process may introduce errors into a message.  Single bit errors versus burst errors Detection:  Requires a convention that.
Error Control Code. Widely used in many areas, like communications, DVD, data storage… In communications, because of noise, you can never be sure that.
1 SNS COLLEGE OF ENGINEERING Department of Electronics and Communication Engineering Subject: Digital communication Sem: V Cyclic Codes.
10.1 Chapter 10 Error Detection and Correction Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Data Link Layer: Error Detection and Correction
MIMO continued and Error Correction Code. 2 by 2 MIMO Now consider we have two transmitting antennas and two receiving antennas. A simple scheme called.
Data and Computer Communications by William Stallings Eighth Edition Digital Data Communications Techniques Digital Data Communications Techniques Click.
Lecture 3-2: Coding and Error Control (Cont.) ECE
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 251 Introduction to Computer Organization.
Error Control Code. Widely used in many areas, like communications, DVD, data storage… In communications, because of noise, you can never be sure that.
Error Detection and Correction
CS717 Algorithm-Based Fault Tolerance Matrix Multiplication Greg Bronevetsky.
§6 Linear Codes § 6.1 Classification of error control system § 6.2 Channel coding conception § 6.3 The generator and parity-check matrices § 6.5 Hamming.
DIGITAL COMMUNICATIONS Linear Block Codes
David Wetherall Professor of Computer Science & Engineering Introduction to Computer Networks Error Detection (§3.2.2)
Computer Science Division
Oct. 2015Part IV – Errors: Informational DistortionsSlide 1.
Lecture Focus: Data Communications and Networking  Data Link Layer  Error Control Lecture 19 CSCS 311.
10.1 Chapter 10 Error Detection and Correction Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
The parity bits of linear block codes are linear combination of the message. Therefore, we can represent the encoder by a linear system described by matrices.
Error Detection. Data can be corrupted during transmission. Some applications require that errors be detected and corrected. An error-detecting code can.
Some Computation Problems in Coding Theory
Data Link Layer. Data Link Layer Topics to Cover Error Detection and Correction Data Link Control and Protocols Multiple Access Local Area Networks Wireless.
Error Detection and Correction
Digital Communications I: Modulation and Coding Course Term Catharina Logothetis Lecture 9.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 PART III: DATA LINK LAYER ERROR DETECTION AND CORRECTION 7.1 Chapter 10.
Data Communications and Networking
Hamming Distance & Hamming Code
Transmission Errors Error Detection and Correction.
10.1 Chapter 10 Error Detection and Correction Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
10.1 Chapter 10 Error Detection and Correction Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1 Product Codes An extension of the concept of parity to a large number of words of data 0110… … … … … … …101.
ECE 442 COMMUNICATION SYSTEM DESIGN LECTURE 10. LINEAR BLOCK CODES Husheng Li Dept. of EECS The University of Tennessee.
Channel Coding: Part I Presentation II Irvanda Kurniadi V. ( ) Digital Communication 1.
1 Wireless Networks Lecture 5 Error Detecting and Correcting Techniques (Part II) Dr. Ghalib A. Shah.
Error Detection and Correction
Subject Name: COMPUTER NETWORKS-1
Coding Theory Dan Siewiorek June 2012.
Fault-Tolerant Computing
Information Redundancy Fault Tolerant Computing
Cyclic Code.
Types of Errors Data transmission suffers unpredictable changes because of interference The interference can change the shape of the signal Single-bit.
Presentation transcript:

Oct Error Correction Slide 1 Fault-Tolerant Computing Dealing with Mid-Level Impairments

Oct Error Correction Slide 2 About This Presentation EditionReleasedRevised FirstOct This presentation has been prepared for the graduate course ECE 257A (Fault-Tolerant Computing) by Behrooz Parhami, Professor of Electrical and Computer Engineering at University of California, Santa Barbara. The material contained herein can be used freely in classroom teaching or any other educational setting. Unauthorized uses are prohibited. © Behrooz Parhami

Oct Error Correction Slide 3 Error Correction

Oct Error Correction Slide 4

Oct Error Correction Slide 5 Multilevel Model Component Logic Service Result Information System Legend: Tolerance Entry Last lecture Today

Oct Error Correction Slide 6 High-Redundancy Codes Triplication is a form of error coding: x represented as xxx (200% redundancy) Corrects any error in one version Detects two nonsimultaneous errors With a larger replication factor, more errors can be tolerated EncodingDecoding f(x)f(x) xy f(x)f(x) f(x)f(x) If we triplicate the voter to obtain 3 results, we are essentially performing the operation f(x) on coded inputs, getting coded outputs V Our challenge today is to come up with strong correction capabilities, using much lower redundancy (perhaps an order of magnitude less) To correct all single-bit errors in an n-bit code, we must have 2 r > n, or 2 r > k + r, which leads to about log 2 k check bits, at least

Oct Error Correction Slide 7 Error-Correcting Codes: Idea A conceptually simple error-correcting code: Arrange the k data bits into a k 1/2  k 1/2 square array Attach an even parity bit to each row and column of the array Row/Column check bit = XOR of all row/column data bits Data space: All 2 k possible k-bit words Redundancy: 2k 1/2 + 1 checkbits for k data bits Corrects all single-bit errors (lead to distinct noncodewords) Detects all double-bit errors (some triples go undetected) Encoding Decoding Data words Codewords Noncodewords Errors Data space Code space Error space To be avoided at all cost

Oct Error Correction Slide 8 Error-Correcting Codes: Evaluation Redundancy: k data bits encoded in n = k + r bits (r redundant bits) Encoding: Complexity (cost / time) to form codeword from data word Decoding: Complexity (cost / time) to obtain data word from codeword Capability: Classes of error that can be corrected Greater correction capability generally involves more redundancy To correct c bit-errors, a minimum code distance of 2c + 1 is required Combined error correction/detection capability: To correct c errors and additionally detect d errors (d > c), a minimum code distance of c + d + 1 is required Example: Hamming SEC/DED code has a code distance of 4 Examples of code correction capabilities: Single, double, byte, b-bit burst, unidirectional,... errors

Oct Error Correction Slide 9 Hamming Distance for Error Correction Red dots represent codewords Yellow dots, noncodewords within distance 1 of codewords, represent correctable errors Blue dot, within distance 2 of three different codewords represents a detectable error Not all “double errors” are correctable, however, because there are points within distance 2 of some codewords that are also within distance 1 of another The following visualization, though not completely accurate, is still useful

Oct Error Correction Slide 10 A Hamming SEC Code d 3 d 2 d 1 d 0 p 2 p 1 p 0 Data bits Parity bits Uses multiple parity bits, each applied to a different subset of data bits Encoding: 3 XOR networks to form parity bits Checking: 3 XOR networks to verify parities Decoding: Trivial (separable code) Redundancy: 3 check bits for 4 data bits Unimpressive, but gets better with more data bits (7, 4); (15, 11); (31, 26); (63, 57); (127, 120) Capability: Corrects any single-bit error s 2 = d 3  d 2  d 1  p 2 s 1 = d 3  d 1  d 0  p 1 s 0 = d 2  d 1  d 0  p 0 s 2 s 1 s 0 Error 0 0 0None p0p p1p d0d p2p d2d d3d d1d1 s 2 s 1 s 0 Syndrome

Oct Error Correction Slide 11 Matrix Formulation of Hamming SEC Code d 3 d 2 d 1 d 0 p 2 p 1 p 0 Data bits Parity bits d 3 d 2 d 1 d 0 p 2 p 1 p s 2 s 1 s 0 Error 0 0 0None p0p p1p d0d p2p d2d d3d d1d1 Parity check matrix d3d2d1d0p2p1p0d3d2d1d0p2p1p0 s2s1s0s2s1s0  = Data bits Parity bits SyndromeReceived word Syndrome matches the p 2 column in the parity check matrix Matrix-vector multiplication is done with AND/XOR, instead of  /+

Oct Error Correction Slide 12 Matrix Rearrangement for Simpler Correction p 0 p 1 d 0 p 2 d 2 d 3 d 1 Data and parity bits p 0 p 1 d 0 p 2 d 2 d 3 d s 2 s 1 s 0 Error 0 0 0None p0p p1p d0d p2p d2d d3d d1d1 p0p1d0p2d2d3d1p0p1d0p2d2d3d1 s2s1s0s2s1s0  = Data and parity bits Syndrome indicates error in position Position number s2s1s0s2s1s0 Decoder 01-7 Data and parity bits Corrected version 1-7

Oct Error Correction Slide 13 Hamming Generator Matrix d 3 d 2 d 1 d 0 p 2 p 1 p 0 Data bits Parity bits d 3 d 2 d 1 d Generator matrix d3d2d1d0d3d2d1d0  = CodewordData word d3d2d1d0p2p1p0d3d2d1d0p2p1p0 Recall that matrix-vector multiplication is done with AND/XOR, instead of  /+ Data bits

Oct Error Correction Slide 14 Generalization to Wider Hamming SEC Codes p 0 p 1 d 0 p : : :... : : : p 0 p 1 d 0 p 2. sr1 :s1s0sr1 :s1s0  = Data and parity bits r –1 Position number s r  1 : s 1 s 0 Decoder 2r12r1 Data and parity bits Corrected version 2r12r1 2r12r1 nk = n – r Condition for general Hamming SEC code: n = k + r = 2 r – 1

Oct Error Correction Slide : 0 prpr srsr A Hamming SEC/DED Code p 0 p 1 d 0 p : : :... : : : p 0 p 1 d 0 p 2. sr1 :s1s0sr1 :s1s0  = Data and parity bits r –1 Position number Add an extra row of all 1s and a column with only one 1 to the parity check matrix Parity check matrixSyndrome Received word s r  1 : s 1 s 0 Decoder 2r12r1 Data and parity bits Corrected version 2r12r1 2r12r1 srsr

Oct Error Correction Slide 16 Some Other Useful Codes BCH codes: Named in honor of Bose, Chaudhuri, Hocquenghem Reed-Solomon codes: Special case of BCH code Example: A popular variant is RS(255, 223) with 8-bit symbols 223 bytes of data, 32 check bytes, redundancy  14% Can correct errors in up to 16 bytes anywhere in the 255-byte codeword Used in CD players, digital audio tape, digital television Hamming codes are examples of linear codes Linear codes may be defined in many other ways Reed-Muller codes: Have a recursive construction, with smaller codes used to build larger ones

Oct Error Correction Slide 17 Arithmetic Error-Correcting Codes –––––––––––––––––––––––––––––––––––––––– Positive Syndrome Negative Syndrome error mod 7 mod 15 –––––––––––––––––––––––––––––––––––––––– 111– – – – – – – – – – – – –––––––––––––––––––––––––––––––––––––––– – – ,38444–16, ,76818–32,76867 –––––––––––––––––––––––––––––––––––––––– Error syndromes for weight-1 arithmetic errors in the (7, 15) biresidue code Because all the syndromes in this table are different, any weight-1 arithmetic error is correctable by the (mod 7, mod 15) biresidue code

Oct Error Correction Slide 18 Properties of Biresidue Codes Biresidue code with relatively prime low-cost check moduli A = 2 a – 1 and B = 2 b – 1 supports a  b bits of data for weight-1 error correction Representational redundancy = (a + b)/(ab) = 1/a + 1/b nk Compare with Hamming SEC code abn=k+a+bk=ab

Oct Error Correction Slide 19 Arithmetic on Biresidue-Coded Operands Similar to residue-checked arithmetic for addition and multiplication, except that two residues are involved Divide/square-root: remains difficult Arithmetic processor with biresidue checking. D(z) D(x) D(y) s B s

Oct Error Correction Slide 20 Higher-Level Error Coding Methods We have applied coding to data at the bit-string or word level It is also possible to apply coding at higher levels Data structure level – Robust data structures Application level – Algorithm-based error tolerance

Oct Error Correction Slide 21 Preview of Algorithm-Based Error Tolerance M r = M f = M = M c = Matrix MRow checksum matrix Column checksum matrixFull checksum matrix Error coding applied to data structures, rather than at the level of atomic data elements Example: mod-8 checksums used for matrices If Z = X  Y then Z f = X c  Y r In M f, any single error is correctable and any 3 errors are detectable Four errors may go undetected