Error Correcting Memory

Slides:



Advertisements
Similar presentations
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Advertisements

Hamming Code.
parity bit is 1: data should have an odd number of 1's
CSCI 4717/5717 Computer Architecture
Computer Networking Error Control Coding
ERROR CORRECTION.
RAID Technology CS350 Computer Organization Section 2 Larkin Young Rob Deaderick Amos Painter Josh Ellis.
NETWORKING CONCEPTS. ERROR DETECTION Error occures when a bit is altered between transmission& reception ie. Binary 1 is transmitted but received is binary.
Error detection and correction
7/2/2015Errors1 Transmission errors are a way of life. In the digital world an error means that a bit value is flipped. An error can be isolated to a single.
Unit 1 Protocols Learning Objectives: Understand the need to detect and correct errors in data transmission.
Error Detection and Reliable Transmission EECS 122: Lecture 24 Department of Electrical Engineering and Computer Sciences University of California Berkeley.
Synchronous - Asynchronous Data Transmission. Asynchronous ► The sender and receiver are not Synchronised. ► The sender sends only one character at a.
Rutvi Shah1 ERROR CORRECTION & ERROR DETECTION Rutvi Shah2 Data can be corrupted during transmission. For reliable communication, errors must be detected.
RAID Shuli Han COSC 573 Presentation.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
Error Detection and Correction
British Computer Society
Data and Computer Communications
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
Data and Computer Communications Chapter 6 – Digital Data Communications Techniques.
Practical Session 10 Error Detecting and Correcting Codes.
Unit 5 Lecture 2 Error Control Error Detection & Error Correction.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 251 Introduction to Computer Organization.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
CS3505: DATA LINK LAYER. data link layer  phys. layer subject to errors; not reliable; and only moves information as bits, which alone are not meaningful.
Overview All data can be corrupted, for reliable communications we must be able to detect and correct errors implemented at the data link and transport.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /3/2013 Lecture 9: Memory Unit Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.
1 © Unitec New Zealand CRC calculation and Hammings code.
Error Detection.
10.1 Chapter 10 Error Detection and Correction Data can be corrupted during transmission. Some applications require that errors be detected and.
Error Detection. Data can be corrupted during transmission. Some applications require that errors be detected and corrected. An error-detecting code can.
Data Communications and Networking
Error-Detecting and Error-Correcting Codes
1/30/ :20 PM1 Chapter 6 ─ Digital Data Communication Techniques CSE 3213 Fall 2011.
Digital Circuits Introduction Memory information storage a collection of cells store binary information RAM – Random-Access Memory read operation.
Error Detection & Correction  Data can be corrupted during transmission.  For reliable transmission, errors must be detected and corrected.  Error detection.
Transmission Errors Error Detection and Correction.
10.1 Chapter 10 Error Detection and Correction Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1 Product Codes An extension of the concept of parity to a large number of words of data 0110… … … … … … …101.
Coding No. 1  Seattle Pacific University Digital Coding Kevin Bolding Electrical Engineering Seattle Pacific University.
Gunjeet Kaur Dronacharya Group of Institutions. Outline I Random-Access Memory Memory Decoding Error Detection and Correction Read-Only Memory Programmable.
Chi-Cheng Lin, Winona State University CS412 Introduction to Computer Networking & Telecommunication Error Correction/Detection.
Practical Session 10 Computer Architecture and Assembly Language.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
CS Introduction to Operating Systems
Memory and Programmable Logic
parity bit is 1: data should have an odd number of 1's
Simple Parity Check The simplest form of error detection is the parity check used with ASCII codes, originally on asynchronous modem links Each 7 bit ASCII.
Computer Architecture and Assembly Language
Vladimir Stojanovic & Nicholas Weaver
Underwater Acoustic Communication
Error Correcting Code.
Representing characters
Computer Architecture & Operations I
Packetizing Error Detection
Packetizing Error Detection
Dr. Clincy Professor of CS
Chapter 9 Error Detection and Correction
Dr. Clincy Professor of CS
Information Redundancy Fault Tolerant Computing
RAID Redundant Array of Inexpensive (Independent) Disks
Packetizing Error Detection
LAB 7.
Chapter 9 Error Detection and Correction
CS 325: CS Hardware and Software Organization and Architecture
Computer Architecture and Assembly Language
parity bit is 1: data should have an odd number of 1's
Error detection: Outline
Error Detection and Correction
Presentation transcript:

Error Correcting Memory EECS 373 Jon Beaumont Ben Mason

What is ECC? Error Correcting Code is a mechanism for systems to ensure that data is reliable in all cases

Why ECC? ECC prevents both Soft Errors Transmission Errors This is particularly necessary in systems that must run continuously with very low tolerance for error

What happens after a Soft Error? Incorrect values in the instruction or data streams Best case: Execution of illegal instructions or memory addresses Automatic reboot Worst case: Error goes undetected and multiplies as data is used to calculate new data http://www.eetimes.com/design/programmable-logic/4390101/Enabling-error-resilience-throughout-the-embedded-system

ECC vs No-ECC

ECC Considerations What range of errors? How much overhead? Detection versus Correction

Different Methods of Memory Correction Detection Parity bit Detection and correction Triple-redundancy Hamming Code

Parity Bit (even parity) For every chunk of data, add a single parity bit set so there are in total an even number of binary 1's An odd number of binary 1's means an error has occured

Parity Bit (even parity) Raw Data: 1001011 (4 1’s) 0101111 (5 1’s) Prepend a parity bit 01001011 10101111

Parity Bit Cons Pros Can detect only an odd number of errors No way to detect which bit caused an error, can only discard data Pros Simple to implement (XOR) Low overhead Good for applications in which the original data can be easily resent/recalculated (e.g. SCSI, PCI, UART)

Triple Redundancy Data is calculated and stored 3 times Majority wins Pros: Simple to execute Can correct errors (potentially multiple bits) Cons: Very inefficient (1/2 data:overhead)

Hamming Code Objective: A concise method of detecting the precise location of an error so that it can be detected and corrected without drastic action Intuition: include multiple parity bits, so that each data bit can be uniquely identified by a set of parity bits which cover it

Hamming Code Algorithm: Assign each position in a chunk of data a binary number Those positions that are a power of 2 (i.e. have exactly one 1 bit) are parity bits

Hamming Code Algorithm: Parity bits cover all data bits whose binary position shares a common 1 bit [D7, D5, D3 , P1]

Hamming Code Algorithm: Parity bits cover all data bits whose binary position shares a common 1 bit [D7,D6, D3, P2]

Hamming Code Algorithm: Parity bits cover all data bits whose binary position shares a common 1 bit [D7, D6, D5, P4]

Hamming Code Example: Encoding the following nibble using even-parity: Allocate space for parity bits: b110_1__

Hamming Code Example: Encoding the following nibble using even-parity: P1 covers [D7,D5,D3] b110_1_?

Hamming Code Example: Encoding the following nibble using even-parity: P1 covers [D7,D5,D3] b110_1_0

Hamming Code Example: Encoding the following nibble using even-parity: P2 covers [D7,D6,D3] b110_1?0

Hamming Code Example: Encoding the following nibble using even-parity: P2 covers [D7,D6,D3] b110_110

Hamming Code Example: Encoding the following nibble using even-parity: P4 covers [D7,D6,D5] b110?110

Hamming Code Example: Encoding the following nibble using even-parity: P4 covers [D7,D6,D5] b1100110

Hamming Code Example: Encoding the following nibble using even-parity: Encoded data b1100110

Hamming Code D6 gets flipped between write and read b1100110 -> b1000110

Hamming Code D6 gets flipped between write and read b1100110 -> b1000110 Parity bit 1: b1000110 Even number 1 bits -> No Error

Hamming Code D6 gets flipped between write and read b1100110 -> b1000110 Parity bit 2: b1000110 Odd number 1 bits -> ERROR Parity bits generating error: [P2]

Hamming Code D6 gets flipped between write and read b1100110 -> b1000110 Parity bit 4: b1000110 Odd number 1 bits -> ERROR Parity bits generating error: [P2, P4]

Hamming Code Parity bits generating error: [P2, P4] X= ERROR O= NO ERROR Only column with just X's is D6, the incorrect bit D3 D5 D6 D7 P1 O P2 X P4

Hamming Code Pros: Overhead of only O(log(n)) bits 4 data bits -> 3 parity bits (57%) 248 data bits -> 8 parity bits (97%) Good for large chunks of memory (DRAM) Cons: More complicated to implement detection logic than simple parity bit

Drawbacks of ECC More Expensive When error correcting algorithm acts on shorter correction code, performance drops abruptly. This loss of performance known as “error floor phenomenon”

Recent Developments in ECC Moving away from Hamming Code scheme towards BCH code which is more efficient For more information visit http://www.princeton.edu/~achaney/tmve/wiki100k/docs/ BCH_code.html

Questions?