29/05/2018 Error Detecting Codes for Serial links: an alternative to error correction Sergio Cavaliere Department of Physics, University of Napoli “Federico II”, Italy and INFN Sezione di Napoli, Italy e-mail: sergio.cavaliere@na.infn.it In this talk I will present some results from a preliminary study of Forward Correcting Codes for the SuperB serial links, simulations results and tools built for the purpose. It’s actually an ongoing work which will require soon a closer integration to the architecture which are being studied for the actual link. XVII SuperB Workshop – La Biodola - may 2011
Cavaliere - SuperB Workshop - may 2011 Abstract In this talk we discuss algorithms and structures for error detection as a possible alternative to full error correcting codes. this solution, suitable to the actual case where the expected error rate is very low, shows good results at a much lower hardware complexity and timing latency. Cavaliere - SuperB Workshop - may 2011
Serial link failures and errors Two main problems regarding errors due to rad hard environment : Loss Of Lock – due to failures on fixed bits in the SERDES – Conclusion: need to provide a direct fast link between transmitter and receiver in order to signall promptly occurrence of LoL Bit errors due to the radiation hard environment: affect data integrity and data quality Solutions: Error Correcting Code (ECC) computationally intensive. Suitable for high noise level may preclude future technological link upgrades Error Detecting Code (EDC) Less intensive computationally Requires re-transmission of data needs a feedback loop or in alternative may allow discarding data off line suitable for low BER Bit Error Rate Cavaliere - SuperB Workshop - may 2011
Error Correction vs Error Detection When retransmission is feasible error correction may be simply obtained by means of Error Detection and subsequent ARQ Automatic Repeat reQuest Due to the low error rate in our case both data-rate and latency are not affected When short data frames do not preclude the overall information data may discarderd later in the communication stream, even in an off line stage. A specified level of data quality must be granted. This is attained because of the low error rate An important parameter for the choice is the noise level: High level noise requires real time Error Correction in order to prevent lowering the data rate (in the case of frequent re-trasmission) Error correction doesn’t require a feedback loop Low level noise would make a little use of a complex correction mechanism: Error Detection may suffice. A repeat mechanism with a consequent doubling of the transmission time of the packet may be adopted ARQ requires a feedback loop to signall errors and require re-transmission Cavaliere - SuperB Workshop - may 2011
Error Detecting Codes: a review A large number of error detection techniques and codes (introduced since the ‘60): CRC Cyclic Redundancy Check Fletcher Checksum Internet checksum XTP CXOR WSC Weighted Sum Codes ……. Parameters for choice are: overhead Probability of undetected error Computational complexity Cavaliere - SuperB Workshop - may 2011
Error Detecting codes: CRC code CRC coding is based on a polynomial representation of a binary message [0 0 1 0 1 0 1 0 1 0 1 0 0 0 1 1 1 0 0 1 0 0 1 0 1 1 0 1 0 0 0 1 1 0 0 0 1 1 1 0] In this representation polynomials are defined in the Galois field GF(2) with the usual x and + operations. Msg polynomial = x7 + x5+ x4+ x3 + 1 Msg = [1 0 1 1 1 0 0 1] + operations + bitwise XOR X bitwise AND Cavaliere - SuperB Workshop - may 2011
Error Detecting codes: CRC code Given a generator CRC polynomial g with g bits and a message msg g= x3 + x+ 1 g = [1 0 1 1] m = x7 + x5+ x4+ x3 + m = [1 0 1 1 1 0 0 0] we may multiply the message by xg m*xg [1 0 1 1 1 0 0 0 0 0 0] if we divide by polynomial g m*xg =qg+r [1 0 1 1 1 0 0 0 0 0 0]= [1 0 0 0 1 0 1 1] [1 0 1 1]+ [1 0 1] adding r m*xg+r=qg+r+r [1 0 1 1 1 0 0 0 0 0 0] + [1 0 1]= [1 0 0 0 1 0 1 1] [1 0 1 1]+[1 0 1] +[1 0 1] m*xg+r=qg [1 0 1 1 1 0 0 0 0 0 0] + [1 0 1]= [1 0 0 0 1 0 1 1] [1 0 1 1] = [1 0 1 1 1 0 0 0 1 0 1] = [msg remainder] This polynomial is then exact multiple of the CRC polynomial g. If we transmit the polynomial mxg+r= [msg remainder] we may verify at the arrival if it is still exact multple of the CRC polynomial g . If this happens we may infer that probably no error was added by noise If this is not true we may infer that probably error(s) was added by noise Cavaliere - SuperB Workshop - may 2011
Error Detecting codes: CRC detection What we will do is appending to the message the remainder of a proper division by the generator polynomial before transmitting the whole. If g is the degree of the generating polynomial we have to add just g check bits again code/g = [1 0 1 1 1 0 0 0 1 0 1]/[1 0 1 1] gives r = [0 0 0] : No ERROR code noisy = [1 0 1 0 1 0 0 0 1 0 1] 1 error code noisy/g= [1 0 1 0 1 0 0 0 1 0 1]/[1 0 1 1] gives r = [0 0 1] : ERROR quozient is discarded [1 0 0 1 1 1 0 0] Cavaliere - SuperB Workshop - may 2011
Cavaliere - SuperB Workshop - may 2011 CRC realization message polynomial Generator polinomial Polinomial division remainder quozient code Feedback shift register Cavaliere - SuperB Workshop - may 2011
Cavaliere - SuperB Workshop - may 2011 Features of CRC coding A large variety of polynomials may be used: the longer the polynomial the larger the overhead and the better the detecting ability. The simplest polinomial x+1 delivers 1 bit remainder and reverts to the usual parity bit. Main features of CRC coding: A proper CRC is able to detect: all single bit errors; any odd number of errors, assuming x + 1 is a factor of g(x); burst errors of length not exceeding g, where g is the number of check bits (order of CRC polynomial) double errors if G(x) contains at least three 1s. The burst feature is invaluable since we expect that the SEU events may affect more than a single bit at a time. Cavaliere - SuperB Workshop - may 2011
CRC coding in existing standards Name r Generator Polynomial Factor x+1 Standard CRC-12 12 x12+x11+x3+x2+x+1 80F y transmission of 6-bit character streams CRC-16 16 x16+x15+x2+1 8005 IBM’s BISYNCH CRC-CCITT x16+x12+x5+ 1021 disk storage XMODEM-X.25-IBM’sSDLC-ISO’sHDLC CRC-32 32 x32+x26+x23+x22+x16+x12+x11+x10+x8+x7+x5+x4+x2+x+1 04C11DB7 n PKZip-Ethernet- AAL5(ATMAdaptationLayer5) FDDI(Fiber Distributed Data Interface) IEEE-802LAN/MAN standard Cavaliere - SuperB Workshop - may 2011
CRC coding in the standard Ethernet protocol The frame check sequence (FCS) field follows the data block in the data frame of the protocol g(X) = X32 + X26 + X23 + X22 + X16 + X12 + X11 + X10 + X8 + X7 + X5 + X4 + X2 + X + 1 32 bit redundancy are added independently from the message length from 512 to 12144 bits Code length n Minimum Hamming distance dmin n. of detected errros 3007 12,144 4 3 301 3006 5 204 300 6 124 203 7 90 123 8 many longer error patterns are detected many burst error patterns are detected Cavaliere - SuperB Workshop - may 2011
CRC coding on a 18 bits block x3+x+1 ovh 20% CRC-4 x4+x+1 ovh 26.7% x4+x3+x2+x+1 CRC-5 x5+x3+x+1 ovh 33.3% x5+1 Efficiency of detection = no. of detected errors/ no. of total errors Efficiency of detection v/s n. of errors in a word Cavaliere - SuperB Workshop - may 2011
Why some errors remain undetected? Limitations of the CRC codes depend on some erratic features. CRC detects all single errors and burst errors up to a certain burst length. Anyway the code has some ability to detect also larger number of errors in the frame. But as a function of message length and number of errors it shows large probability that it may detect the errors even if it doesn’t grant the detection. This happens since, remembering the fact that : The code is multiple of the generator g if noise pattern too is an integer multiple of g the resulting received word divided by the CRC polynomial g will give no remainder and then will signall absence of noise This happens with a low but non zero probability, depending also on the length of the trasmitted word. This may be analyzed further…….. Cavaliere - SuperB Workshop - may 2011
Undetected error probability for CRC We may analyze all possible error patterns and find out which actually fail. We may plot the number of undetectable error pattern with a fixed number of error in it as a function of the length of the message. msg_len = 8; undetected: [0 4 26 44 50 58 46 19 6 2 0] msg_len = 9; undetected: [0 5 34 66 88 114 108 61 24 9 2 0] The trend shows a fast increase in this number Cavaliere - SuperB Workshop - may 2011
Error detection codes for SuperB Starting point for the serial trasmission and the parallel to serial conversion is the basic block length of 18 bits. Information on error control (generalized parity bits) may be: Appended to each 18 bits block Or, since a block of 5 to 10 18 bit serdes stream is foreseen as an unit transmission block, which should be treated as a whole, and in case of error discarded entirely Appended to a number N of 5..1018 bits blocks Cavaliere - SuperB Workshop - may 2011
Error detection codes for SuperB serdes 18 18bit n=4 7 Data to transmit buffer & scrambler serial link Ecc = 12 % Overhead = 44 % CRC generator 65 bit Data to distribute 4*18=72bit Buffer & descrambler CRC check 11 3 Considering blocks of N18 bit serdes stream ERROR flag / ARQ request 72 Cavaliere - SuperB Workshop - may 2011
Detection efficiency for CRC Two main parameters are overhead=crc_bits/message_bits efficiency = no. Detected / total no. Errors Efficiency is almost constant against the overhead and relatively high, below the 100% value. CRC 7 7 bits parity Polynomial is x7+x3+1 Block length in the range 4*18bits 10*18bits overhead 411 % Cavaliere - SuperB Workshop - may 2011
Undetected error probability for CRC This high value fo the efficiency depends of course mainly on CRC length but also on the polynomial choice. We may verify in the literature that even some of the polynomial chosen for some standards are not at all optimal Cavaliere - SuperB Workshop - may 2011
Cavaliere - SuperB Workshop - may 2011 Simulating CRC check CRC len polynomial N=5 N=6 N=7 N=8 N=9 CRC - 5 x^5+x^3+1 5.9 4.9 4.1 3.6 3.2 CRC - 6 x^6+x+1 7.1 5 4.3 3.8 CRC - 7 x^7+x^3+1 8.4 6.9 5.1 4.5 CRC - 8 x^8+x^2+x+1 9.8 8 6.8 5.2 CRC - 9 x^9+x^7+x^6+x^3+x^2+x+1 11 9.1 7.7 6.7 Choosen polynomials and N multiplicity to obtain a range 5% to 12% overhead efficiency of the detection v/s overhead Cavaliere - SuperB Workshop - may 2011
Error Detecting codes: CHECKSUMs Checksum was introduced in order to grant really very simple hardware and software implementations. In fact CRC are easely implemented by means of serial processing via shift register with a number of feedback paths. When implemented in software as for example in the Internet case this serial arrangement is slower than a parallel implementation which in turn is relatively intensive. Also in our case the serial bit stream is embodied in the SERDES chip which from the external shows only the parallel path. CHECKSUMs show much simpler algorythms at the cost of less performance Cavaliere - SuperB Workshop - may 2011
Error Detecting codes: checksum A number of different solutions are devised: The message is divided in words which are used to obtain one or more extra words to be transmitted to allow a control at the arrival. Parity byte or parity word Modular sum Position-dependent checksums Fletcher Checksum weighted sum code (WSC) Fletcher checksum (used in ISO) one’scomplement checksum (used in Internet) circular-shift exclusive-OR checksum (CXOR) block-parity code checksum checksum Cavaliere - SuperB Workshop - may 2011
CHECKSUMs: comparison Parameters for a comparison are d minumum distance between codewords b burst error detecting capacity h number of check bits Lmax maximum code length allowed Cavaliere - SuperB Workshop - may 2011
CHECKSUMs: simulations Short words eg, a single 18 bits stream protection deliver both high overhead low efficiency Protecting multiple 18 bits stream – N*18bits blocks give better results as far as regards: overhead efficiency Cavaliere - SuperB Workshop - may 2011
CHECKSUMs: multiple 18 bits stream Number in parenthesis [N S n] are N number of 18bits blocks protected at the same time S number of words making the protected block N length of the single word and also of the «parity» word Cavaliere - SuperB Workshop - may 2011
CHECKSUMs: multiple 18 bits stream range of interest Number in parenthesis [N S n] are N number of 18bits blocks protected at the same time S number of words making the protected block N length of the single word and also of the «parity» word Cavaliere - SuperB Workshop - may 2011
Cavaliere - SuperB Workshop - may 2011 CRC vs CHECKSUMs CRC shows large gap in performance as shown in the figures related to the 2 errors and 4 errors. Cavaliere - SuperB Workshop - may 2011
Cavaliere - SuperB Workshop - may 2011 29/05/2018 To be done next Obtain precise figures on the bit error rate in our rad hard environment Complete the analysis/simulation of the large set of possible algorythms to obtain checksums Take into consideration the specific statistics of the data/commands to be transmitted in order to optimize some parameters evaluate different hardware implementations in order to present practical alternatives to be evaluated for a final choice Define to that purpose which choices may be allowed by a comprehensive implemantation (programmable hardware) analyze thoroughly the impact of error rates on the performance of the overall apparatus, trigger rate, latence time and data quality Cavaliere - SuperB Workshop - may 2011
Cavaliere - SuperB Workshop - may 2011 Conclusions We made a recognition of current techniques for the detection of errors in the SuperB DAQ, with the aim of minimizing the required computational power/hardware/latency/robustness in comparison with full error correcting coding We have developed some statistical analysis to obtain figures useful from our specific viwpoint, mainly the required overhed and undetected error probability We have developed simulations in order to assess practical figures We have set up some software useful to develop further analysis and evaluate different alternatives and algorythms for a final choice Cavaliere - SuperB Workshop - may 2011