29/05/2018 Error Detecting Codes for Serial links: an alternative to error correction Sergio Cavaliere Department of Physics, University of Napoli “Federico.

Slides:



Advertisements
Similar presentations
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Advertisements

Cyclic Code.
Computer Interfacing and Protocols
Data and Computer Communications
Reliability & Channel Coding
Error Detection and Correction
Transmission Errors Error Detection and Correction
Error Detection and Correction
CSCI 4550/8556 Computer Networks Comer, Chapter 7: Packets, Frames, And Error Detection.
1 Chapter Six - Errors, Error Detection, and Error Control Chapter Six.
Transmission Errors1 Error Detection and Correction.
Long distance communication Multiplexing  Allow multiple signals to travel through one medium  Types Frequency division multiplexing Synchronous time.
Chapter 6 Errors, Error Detection, and Error Control
7/2/2015Errors1 Transmission errors are a way of life. In the digital world an error means that a bit value is flipped. An error can be isolated to a single.
Data Transmission Most digital messages are longer than just a few bits. It is neither practical nor economical to transfer all bits of a long message.
1/26 Chapter 6 Digital Data Communication Techniques.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 PART III: DATA LINK LAYER ERROR DETECTION AND CORRECTION 7.1 Chapter 10.
Transmission Errors Error Detection and Correction
COM342 Networks and Data Communications
Lecture 10: Error Control Coding I Chapter 8 – Coding and Error Control From: Wireless Communications and Networks by William Stallings, Prentice Hall,
1 Data Link Layer Lecture 20 Imran Ahmed University of Management & Technology.
Error Coding Transmission process may introduce errors into a message.  Single bit errors versus burst errors Detection:  Requires a convention that.
Part 2: Packet Transmission Packets, frames Local area networks (LANs) Wide area networks (LANs) Hardware addresses Bridges and switches Routing and protocols.
Data and Computer Communications Chapter 6 – Digital Data Communications Techniques.
Data and Computer Communications by William Stallings Eighth Edition Digital Data Communications Techniques Digital Data Communications Techniques Click.
Linear Feedback Shift Register. 2 Linear Feedback Shift Registers (LFSRs) These are n-bit counters exhibiting pseudo-random behavior. Built from simple.
COSC 3213: Computer Networks I Instructor: Dr. Amir Asif Department of Computer Science York University Section M Topics: 1. Error Detection Techniques:
Data Integrity © Prof. Aiman Hanna Department of Computer Science Concordia University Montreal, Canada.
CS3505: DATA LINK LAYER. data link layer  phys. layer subject to errors; not reliable; and only moves information as bits, which alone are not meaningful.
1 Chapter Six - Errors, Error Detection, and Error Control Chapter Six.
Data Communications & Computer Networks, Second Edition1 Chapter 6 Errors, Error Detection, and Error Control.
Network Layer4-1 Chapter 5: The Data Link Layer Our goals: r understand principles behind data link layer services: m error detection, correction m sharing.
Part III: Data Link Layer Error Detection and Correction
Chi-Cheng Lin, Winona State University CS412 Introduction to Computer Networking & Telecommunication Error Correction/Detection.
Chapter Nine: Data Transmission. Introduction Binary data is transmitted by either by serial or parallel methods Data transmission over long distances.
Coding and Error Control
Lecture 4 Error Detecting and Correcting Techniques Dr. Ghalib A. Shah
Serial link Loss-of-Lock impact on trigger distribution dead time
Dr. Clincy Professor of CS
Simple Parity Check The simplest form of error detection is the parity check used with ASCII codes, originally on asynchronous modem links Each 7 bit ASCII.
Error Correcting Codes for Serial links : an update
Computer Architecture and Assembly Language
Introduction to Information Technologies
Data Link Layer.
Error Detection and Correction
Subject Name: COMPUTER NETWORKS-1
Communication Networks: Technology & Protocols
DATA COMMUNICATION AND NETWORKINGS
Advanced Computer Networks
CIS 321 Data Communications & Networking
Data Link Layer What does it do?
Part III Datalink Layer 10.
Chapter 10 Error Detection And Correction
Chapter 3 Digital Transmission Fundamentals
Chapter 7 Error Detection and Correction
EEC-484/584 Computer Networks
Data Transmission Most digital messages are longer than just a few bits. It is neither practical nor economical to transfer all bits of a long message.
Introduction to Information Technologies
Chapter Nine: Data Transmission
Lecture 3 Digital Transmission Fundamentals
Transmission Errors Error Detection and Correction
Error Detection Neil Tang 9/26/2008
Coding and Error Control
Error Detection and Correction
Computer Architecture and Assembly Language
Error Detection and Correction
Transmission Errors Error Detection and Correction
Data Link Layer. Position of the data-link layer.
Chapter 10 Error Detection and Correction
Presentation transcript:

29/05/2018 Error Detecting Codes for Serial links: an alternative to error correction Sergio Cavaliere Department of Physics, University of Napoli “Federico II”, Italy and INFN Sezione di Napoli, Italy e-mail: sergio.cavaliere@na.infn.it In this talk I will present some results from a preliminary study of Forward Correcting Codes for the SuperB serial links, simulations results and tools built for the purpose. It’s actually an ongoing work which will require soon a closer integration to the architecture which are being studied for the actual link. XVII SuperB Workshop – La Biodola - may 2011

Cavaliere - SuperB Workshop - may 2011 Abstract In this talk we discuss algorithms and structures for error detection as a possible alternative to full error correcting codes. this solution, suitable to the actual case where the expected error rate is very low, shows good results at a much lower hardware complexity and timing latency. Cavaliere - SuperB Workshop - may 2011

Serial link failures and errors Two main problems regarding errors due to rad hard environment : Loss Of Lock – due to failures on fixed bits in the SERDES – Conclusion: need to provide a direct fast link between transmitter and receiver in order to signall promptly occurrence of LoL Bit errors due to the radiation hard environment: affect data integrity and data quality Solutions: Error Correcting Code (ECC) computationally intensive. Suitable for high noise level may preclude future technological link upgrades Error Detecting Code (EDC) Less intensive computationally Requires re-transmission of data needs a feedback loop or in alternative may allow discarding data off line suitable for low BER Bit Error Rate Cavaliere - SuperB Workshop - may 2011

Error Correction vs Error Detection When retransmission is feasible error correction may be simply obtained by means of Error Detection and subsequent ARQ Automatic Repeat reQuest Due to the low error rate in our case both data-rate and latency are not affected When short data frames do not preclude the overall information data may discarderd later in the communication stream, even in an off line stage. A specified level of data quality must be granted. This is attained because of the low error rate An important parameter for the choice is the noise level: High level noise requires real time Error Correction in order to prevent lowering the data rate (in the case of frequent re-trasmission) Error correction doesn’t require a feedback loop Low level noise would make a little use of a complex correction mechanism: Error Detection may suffice. A repeat mechanism with a consequent doubling of the transmission time of the packet may be adopted ARQ requires a feedback loop to signall errors and require re-transmission Cavaliere - SuperB Workshop - may 2011

Error Detecting Codes: a review A large number of error detection techniques and codes (introduced since the ‘60): CRC Cyclic Redundancy Check Fletcher Checksum Internet checksum XTP CXOR WSC Weighted Sum Codes ……. Parameters for choice are: overhead Probability of undetected error Computational complexity Cavaliere - SuperB Workshop - may 2011

Error Detecting codes: CRC code CRC coding is based on a polynomial representation of a binary message [0 0 1 0 1 0 1 0 1 0 1 0 0 0 1 1 1 0 0 1 0 0 1 0 1 1 0 1 0 0 0 1 1 0 0 0 1 1 1 0] In this representation polynomials are defined in the Galois field GF(2) with the usual x and + operations. Msg polynomial = x7 + x5+ x4+ x3 + 1 Msg = [1 0 1 1 1 0 0 1] + operations + bitwise XOR X bitwise AND Cavaliere - SuperB Workshop - may 2011

Error Detecting codes: CRC code Given a generator CRC polynomial g with g bits and a message msg g= x3 + x+ 1 g = [1 0 1 1] m = x7 + x5+ x4+ x3 + m = [1 0 1 1 1 0 0 0] we may multiply the message by xg m*xg [1 0 1 1 1 0 0 0 0 0 0] if we divide by polynomial g m*xg =qg+r [1 0 1 1 1 0 0 0 0 0 0]= [1 0 0 0 1 0 1 1] [1 0 1 1]+ [1 0 1] adding r m*xg+r=qg+r+r [1 0 1 1 1 0 0 0 0 0 0] + [1 0 1]= [1 0 0 0 1 0 1 1] [1 0 1 1]+[1 0 1] +[1 0 1] m*xg+r=qg [1 0 1 1 1 0 0 0 0 0 0] + [1 0 1]= [1 0 0 0 1 0 1 1] [1 0 1 1] = [1 0 1 1 1 0 0 0 1 0 1] = [msg remainder] This polynomial is then exact multiple of the CRC polynomial g. If we transmit the polynomial mxg+r= [msg remainder] we may verify at the arrival if it is still exact multple of the CRC polynomial g . If this happens we may infer that probably no error was added by noise If this is not true we may infer that probably error(s) was added by noise Cavaliere - SuperB Workshop - may 2011

Error Detecting codes: CRC detection What we will do is appending to the message the remainder of a proper division by the generator polynomial before transmitting the whole. If g is the degree of the generating polynomial we have to add just g check bits again code/g = [1 0 1 1 1 0 0 0 1 0 1]/[1 0 1 1] gives r = [0 0 0] : No ERROR code noisy = [1 0 1 0 1 0 0 0 1 0 1] 1 error code noisy/g= [1 0 1 0 1 0 0 0 1 0 1]/[1 0 1 1] gives r = [0 0 1] : ERROR quozient is discarded [1 0 0 1 1 1 0 0] Cavaliere - SuperB Workshop - may 2011

Cavaliere - SuperB Workshop - may 2011 CRC realization message polynomial Generator polinomial Polinomial division remainder quozient code Feedback shift register Cavaliere - SuperB Workshop - may 2011

Cavaliere - SuperB Workshop - may 2011 Features of CRC coding A large variety of polynomials may be used: the longer the polynomial the larger the overhead and the better the detecting ability. The simplest polinomial x+1 delivers 1 bit remainder and reverts to the usual parity bit. Main features of CRC coding: A proper CRC is able to detect: all single bit errors; any odd number of errors, assuming x + 1 is a factor of g(x); burst errors of length not exceeding g, where g is the number of check bits (order of CRC polynomial) double errors if G(x) contains at least three 1s. The burst feature is invaluable since we expect that the SEU events may affect more than a single bit at a time. Cavaliere - SuperB Workshop - may 2011

CRC coding in existing standards Name r Generator Polynomial Factor x+1 Standard CRC-12 12 x12+x11+x3+x2+x+1 80F y transmission of 6-bit character streams CRC-16 16 x16+x15+x2+1 8005 IBM’s BISYNCH CRC-CCITT x16+x12+x5+ 1021 disk storage XMODEM-X.25-IBM’sSDLC-ISO’sHDLC CRC-32 32 x32+x26+x23+x22+x16+x12+x11+x10+x8+x7+x5+x4+x2+x+1 04C11DB7 n PKZip-Ethernet- AAL5(ATMAdaptationLayer5) FDDI(Fiber Distributed Data Interface) IEEE-802LAN/MAN standard Cavaliere - SuperB Workshop - may 2011

CRC coding in the standard Ethernet protocol The frame check sequence (FCS) field follows the data block in the data frame of the protocol g(X) = X32 + X26 + X23 + X22 + X16 + X12 + X11 + X10 + X8 + X7 + X5 + X4 + X2 + X + 1 32 bit redundancy are added independently from the message length from 512 to 12144 bits Code length n Minimum Hamming distance dmin n. of detected errros 3007 12,144 4 3 301 3006 5 204 300 6 124 203 7 90 123 8 many longer error patterns are detected many burst error patterns are detected Cavaliere - SuperB Workshop - may 2011

CRC coding on a 18 bits block x3+x+1 ovh 20% CRC-4 x4+x+1 ovh 26.7% x4+x3+x2+x+1 CRC-5 x5+x3+x+1 ovh 33.3% x5+1 Efficiency of detection = no. of detected errors/ no. of total errors Efficiency of detection v/s n. of errors in a word Cavaliere - SuperB Workshop - may 2011

Why some errors remain undetected? Limitations of the CRC codes depend on some erratic features. CRC detects all single errors and burst errors up to a certain burst length. Anyway the code has some ability to detect also larger number of errors in the frame. But as a function of message length and number of errors it shows large probability that it may detect the errors even if it doesn’t grant the detection. This happens since, remembering the fact that : The code is multiple of the generator g if noise pattern too is an integer multiple of g the resulting received word divided by the CRC polynomial g will give no remainder and then will signall absence of noise This happens with a low but non zero probability, depending also on the length of the trasmitted word. This may be analyzed further…….. Cavaliere - SuperB Workshop - may 2011

Undetected error probability for CRC We may analyze all possible error patterns and find out which actually fail. We may plot the number of undetectable error pattern with a fixed number of error in it as a function of the length of the message. msg_len = 8; undetected: [0 4 26 44 50 58 46 19 6 2 0] msg_len = 9; undetected: [0 5 34 66 88 114 108 61 24 9 2 0] The trend shows a fast increase in this number Cavaliere - SuperB Workshop - may 2011

Error detection codes for SuperB Starting point for the serial trasmission and the parallel to serial conversion is the basic block length of 18 bits. Information on error control (generalized parity bits) may be: Appended to each 18 bits block Or, since a block of 5 to 10 18 bit serdes stream is foreseen as an unit transmission block, which should be treated as a whole, and in case of error discarded entirely Appended to a number N of 5..1018 bits blocks Cavaliere - SuperB Workshop - may 2011

Error detection codes for SuperB serdes 18 18bit n=4 7 Data to transmit buffer & scrambler serial link Ecc = 12 % Overhead = 44 % CRC generator 65 bit Data to distribute 4*18=72bit Buffer & descrambler CRC check 11 3 Considering blocks of N18 bit serdes stream ERROR flag / ARQ request 72 Cavaliere - SuperB Workshop - may 2011

Detection efficiency for CRC Two main parameters are overhead=crc_bits/message_bits efficiency = no. Detected / total no. Errors Efficiency is almost constant against the overhead and relatively high, below the 100% value. CRC 7 7 bits parity Polynomial is x7+x3+1 Block length in the range 4*18bits 10*18bits overhead 411 % Cavaliere - SuperB Workshop - may 2011

Undetected error probability for CRC This high value fo the efficiency depends of course mainly on CRC length but also on the polynomial choice. We may verify in the literature that even some of the polynomial chosen for some standards are not at all optimal Cavaliere - SuperB Workshop - may 2011

Cavaliere - SuperB Workshop - may 2011 Simulating CRC check CRC len polynomial N=5 N=6 N=7 N=8 N=9 CRC - 5 x^5+x^3+1 5.9 4.9 4.1 3.6 3.2 CRC - 6 x^6+x+1 7.1 5 4.3 3.8 CRC - 7 x^7+x^3+1 8.4 6.9 5.1 4.5 CRC - 8 x^8+x^2+x+1 9.8 8 6.8 5.2 CRC - 9 x^9+x^7+x^6+x^3+x^2+x+1 11 9.1 7.7 6.7 Choosen polynomials and N multiplicity to obtain a range 5% to 12% overhead efficiency of the detection v/s overhead Cavaliere - SuperB Workshop - may 2011

Error Detecting codes: CHECKSUMs Checksum was introduced in order to grant really very simple hardware and software implementations. In fact CRC are easely implemented by means of serial processing via shift register with a number of feedback paths. When implemented in software as for example in the Internet case this serial arrangement is slower than a parallel implementation which in turn is relatively intensive. Also in our case the serial bit stream is embodied in the SERDES chip which from the external shows only the parallel path. CHECKSUMs show much simpler algorythms at the cost of less performance Cavaliere - SuperB Workshop - may 2011

Error Detecting codes: checksum A number of different solutions are devised: The message is divided in words which are used to obtain one or more extra words to be transmitted to allow a control at the arrival. Parity byte or parity word Modular sum Position-dependent checksums Fletcher Checksum weighted sum code (WSC) Fletcher checksum (used in ISO) one’scomplement checksum (used in Internet) circular-shift exclusive-OR checksum (CXOR) block-parity code checksum checksum Cavaliere - SuperB Workshop - may 2011

CHECKSUMs: comparison Parameters for a comparison are d minumum distance between codewords b burst error detecting capacity h number of check bits Lmax maximum code length allowed Cavaliere - SuperB Workshop - may 2011

CHECKSUMs: simulations Short words eg, a single 18 bits stream protection deliver both high overhead low efficiency Protecting multiple 18 bits stream – N*18bits blocks give better results as far as regards: overhead efficiency Cavaliere - SuperB Workshop - may 2011

CHECKSUMs: multiple 18 bits stream Number in parenthesis [N S n] are N number of 18bits blocks protected at the same time S number of words making the protected block N length of the single word and also of the «parity» word Cavaliere - SuperB Workshop - may 2011

CHECKSUMs: multiple 18 bits stream range of interest Number in parenthesis [N S n] are N number of 18bits blocks protected at the same time S number of words making the protected block N length of the single word and also of the «parity» word Cavaliere - SuperB Workshop - may 2011

Cavaliere - SuperB Workshop - may 2011 CRC vs CHECKSUMs CRC shows large gap in performance as shown in the figures related to the 2 errors and 4 errors. Cavaliere - SuperB Workshop - may 2011

Cavaliere - SuperB Workshop - may 2011 29/05/2018 To be done next Obtain precise figures on the bit error rate in our rad hard environment Complete the analysis/simulation of the large set of possible algorythms to obtain checksums Take into consideration the specific statistics of the data/commands to be transmitted in order to optimize some parameters evaluate different hardware implementations in order to present practical alternatives to be evaluated for a final choice Define to that purpose which choices may be allowed by a comprehensive implemantation (programmable hardware) analyze thoroughly the impact of error rates on the performance of the overall apparatus, trigger rate, latence time and data quality Cavaliere - SuperB Workshop - may 2011

Cavaliere - SuperB Workshop - may 2011 Conclusions We made a recognition of current techniques for the detection of errors in the SuperB DAQ, with the aim of minimizing the required computational power/hardware/latency/robustness in comparison with full error correcting coding We have developed some statistical analysis to obtain figures useful from our specific viwpoint, mainly the required overhed and undetected error probability We have developed simulations in order to assess practical figures We have set up some software useful to develop further analysis and evaluate different alternatives and algorythms for a final choice Cavaliere - SuperB Workshop - may 2011