Varadarajan Srinivasan, Julian W. Farquharson,

Evaluation of Error Detection Strategies for an FPGA-Based Self-Checking Arithmetic and Logic Unit
Varadarajan Srinivasan, Julian W. Farquharson, William H. Robinson, and Bharat L. Bhuva Department of Electrical Engineering and Computer Science Vanderbilt University Nashville, TN

Soft Errors in FPGAs Error Location Error Classification [1]
Data/Logic Errors Corrupt the data processed by the circuit Configuration Bit Errors Can alter the functionality of the circuit Error Classification [1] Persistent Errors Cannot be flushed out of the system by an SEU correction scheme Example: Counters, Flip-Flops Non-Persistent Errors Can be corrected by scrubbing or partial reconfiguration Example: Adder, Feed-Forward Logic Data Errors Configuration Bit Errors State Machine Persistent Errors FPGA based circuits are often used in harsh radiation environments. These circuits are vulnerable to SEUs. The SEUs in the FPGA circuits have the potential to modify the functionality or the behavior of the circuit. Error occurring in data streams can result in computation of incorrect results whereas errors in configuration bits can affect the circuit behavior. Data errors may occur due to the transients in the latches and logic elements. The occurrence of the data error is dependent upon the nature and the characteristics of the transient. On the other hand, configuration bit errors can be either persistent or non-persistent. Persistent errors cannot be corrected by an SEU correction scheme. For example, an SEU in the program counter can change the state of the processor. A correction scheme applied to the counter would fix the circuit, however the processor cannot recover from the incorrect state. A hard reset is required to recover from persistent errors. However, the non-persistent errors occurring in the throughput/feed-forward logic does not affect the state, hence errors in these circuits can be corrected by an SEU correction scheme. Several techniques have been used to recover from SEUs in non-persistent bit errors. Throughput Logic Non-Persistent Errors Error Space of a FPGA Unit [1] D. E. Johnson et al., “ Persistent Errors in SRAM based FPGAs,” MAPLD 2004, Washington DC Srinivasan MAPLD 2005/216

Error Correction Techniques
Error Correction for SEUs in Configuration Bits Scrubbing Partial Reconfiguration Disadvantages of Error Correction Scheme: Frequent scrubbing is required to ensure proper functionality Configuration logic will be in write mode for a greater percentage of time Covers bit stream errors only which account for 45% of the total errors observed in an accelerator test [1] Error Correction for SEUs in Data Bits Triple Mode Redundancy in the base module Huge area and power penalties in replicating the circuit [1] Earl Fuller et al., “Radiation Testing Update, SEU Mitigation, and Availability Analysis of the Virtex FPGA for Space Reconfigurable Computing,” MAPLD 2000, Washington DC A simpler method to correct SEUs would be to omit configuration memory read back and SEU detection and simply re-load the configuration bits at a chosen interval. This is called scrubbing. Scrubbing requires less overhead in the system, however the configuration logic is likely to be in the write mode for a greater percentage of time. Partial reconfiguration requires detection of SEUs. In partial re-configuration, the configuration array is provided with an post configuration read back and post configuration write operation. Read back and post configuration allows system to detect and repair SEUs without affecting the circuit operation. Read back and comparison requires a read back and mask file which triples the size of configuration memory. Even though CRC frame checks can be used, it still requires a larger configuration memory for error detection and correction. Further, the circuit would be sensitive between read back operations. Triple mode redundancy can correct errors in both data bits as well as the configuration bits, but it results in a large increase in area and power. However, if we can design logic blocks with built in error detection schemes, then an error signal from the error detection logic can be used to scrub or re-configure the configuration memory on demand. Simplest and the most reliable technique is to triplicate the configuration bits. However, there disadvantages to these error correction schemes. Srinivasan MAPLD 2005/216

Error Correction in FPGA
Values normalized to TMR processor TMR - TRIPLE MODE REDUNDANCY BC - BERGER CHECK ED - REMAINDER/PARITY CHECK Built-in error detection can detect errors both in data as well as configuration bit SEUs. The bar chart shows the area-delay product of designing an 16-bit ALU with two built in error detection scheme. We can see that area-delay product of an Berger error detection scheme is 71% of that of an TMR processor. Penalties can be minimized using systematic codes to detect errors. In accelerator test reported previously (Fuller 2000 MAPLD) only 45% of the errors were due to configuration bit stream errors Built-In-Error-Detection can detect majority of errors (including Single event transients) in FPGA at a reduced area, performance penalty Error can be removed by instruction Re-execution/ Bit stream Re-Configuration Srinivasan MAPLD 2005/216

Built-In-Error-Detection (BIED) Scheme for SEUs in Data and Configuration Bits
Logic Block Check Symbol From Inputs From Output Compare Check Symbols Re-Generate/ Re-Configure Signal A B Systematic Codes can be used to design a BIED Logic to detect errors in both configuration bits and data Errors in data can be eliminated by re-executing the instruction and the corresponding data Errors in the configuration bits can be corrected at the instant of error detection rather than frequent scrubbing The block diagram shows the logic using built-in error-detection scheme. A check symbol is calculated using the input information/data symbols. This input check symbol is then compared with the check symbol calculated from the output. In case of an SEU, the check symbol would not match which would result in the generation of an Re-generate/Re-configure signal. The Re-generate signal is used to re-execute the data and the opcode for recovering from the data errors. The Re-Configure signal is used to re-configure the configuration array memory to correct functionality errors. Further, the configuration memory can be corrected at the instant of error detection rather than scrubbing frequently. Srinivasan MAPLD 2005/216

Error Correction Codes
Three techniques are studied Global Coding technique for all the operations Berger check prediction [1][2] Error Codes based on instruction groupings Remainder check and parity check Triple Mode Redundancy A single-instruction-issue processor has been designed and implemented with each mentioned Error Detection Techniques Errors Corrected by re-executing Instruction and data Three different error correction techniques were studied. The logic circuit used for study is an 16-bit ALU with arithmetic, logical, shift and rotate instructions. ALUs form the core of the arithmetic processing units and errors in ALUs can propagate multiple stages/units in the processor. Further, ALU is representative of a system level feed-forward logic block. However, error detection in ALU is difficult due to its diversity of operations. Further linear arithmetic codes may work for arithmetic instruction but may not be true for logical and shift instructions. We used systematic separate codes to detect errors in ALU. Berger check codes can detect errors in all arithmetic, logical, shift and rotate instructions and can be used as a global coding strategy for the ALU system. Remainder check and parity check divides the instructions into groups based on the nature of operation. Based on the type of operation different check symbols are calculated. A single instruction issue processor is designed with each of the error detection technique and the triple mode redundant ALU. [1] J. C. Lo et al., “An SFS Berger check prediction ALU and its application to self checking processor designs,” IEEE Trans. on Computer Aided-Design, vol. 11, pp , Apr [2] J. C. Lo et al., “Concurrent Error Detection in Arithmetic and Logical Operations Using Berger Codes,” Proceedings of 9th Symposium on computer arithmetic , pp , Sep 1989. Srinivasan MAPLD 2005/216

Target Device – Altera’s FLEX chip EPF10k70RC240
EPF10k70RC240 Device Features Typical Gates (Logic & RAM) Logic Elements Supply Voltage – 5.0 Volts 0.42 μm CMOS process Simulation Software Quartus II The target device for the circuit implementation is FLEX chip EPF10k70RC240 from ALTERA. The target chip is designed in 0.42um CMOS process. The device consists of 3744 logic elements and equivalent gates. We used Quartus II simulation tool for analysis and synthesis. FLEX 10k Device Block Diagram Srinivasan MAPLD 2005/216

Fault Injection Implemented the three ALU designs (BCALU, EDALU and TMR ALU) Realized the ALU designs in the target chip Bit-flip model used to inject fault while running an assembly language program Error correction achieved by re-transmission of instruction and data Srinivasan MAPLD 2005/216

Berger Check Prediction Design of BCALU

Background Information – Berger Codes
Berger codes are systematic separate codes[1] Information symbol and check symbol separated from each other Berger codes are capable of detecting multiple-bit unidirectional errors Two possible Berger encoding schemes Check symbol is calculated from the binary representation of the number of 0’s in the information symbol Check symbol is the 1’s complement of the number of 1’s in information symbol Length of Berger check symbol is minimal among all systematic codes [2] [1] C. Metra et. al, “Novel Berger code checker,” IEEE Proceedings on Defect and Fault tolerance in VLSI systems, Nov. 1995, pp [2] C. V. Frieman, “Optimal error detection codes for completely asymmetric binary channels,” Inform. Contr., vol. 5, pp , March 1962. Berger Codes are systematic separate codes which can be used to detect error in the Arithmetic and logic units. The check symbol calculated from berger check code is capable of detecting errors in all the common ALU arithmetic, logical, shift and rotate operations. The reason for the use of separate codes is that the linear arithmetic codes are preserved under arithmetic operations, but does not work logical and shift operations. By separating the information and check symbols, the information and check bits can be subjected to different operations to detect the error. Berger check codes can detect all possible unidirectional/feed-forward errors. There are two possible ways of implementing berger check codes. Explanation of two encoding Further length of the berger check symbol is minimal among systematic codes which means lesser additional logic for implementation. Further berger check codes scales well for larger data widths. Srinivasan MAPLD 2005/216

Example: Berger Check Prediction for ADD Instruction
ALU Operation (ADD) Berger Check Symbol Calculation C Cc = 1 X Xc = 2 Y + Yc = 2 Cin = 0 S Cout = 1 This slide shows an example of the berger check prediction for an ADD instruction. X and Y represent the input data to the ALU and S is the ALU output. C represents the internal carries generated during the ALU ADD operation. All the term with subscript C represents the no. of zeroes in the corresponding terms. For example Cc represents the no. of zeroes in the internal carries, XC and YC are no. of zeroes in X and Y. Sc represents the berger check symbol calculated from carries and input data symbols. We can see that Sc correctly predicts the number of zeroes in the output S. Any error in the output would result in a mismatch of the check symbols. Sc = Xc + Yc –Cc –Cin + Cout = – 1 – = 4 Xc, Yc = No. of 0’s in data X and Y ,Cin = Carry in, Cout = Carry out Cc = No. of 0’s in internal carries, Sc = Berger Check Symbol S = ALU output Srinivasan MAPLD 2005/216

Example: Berger Check Prediction for AND Instruction
ALU Operation (AND) Berger Check Symbol Calculation X Xc = 2 Y Yc = 2 Cin = 0 S X or Y = , (X or Y)c = 1 Berger Check codes not only work for arithmetic operations but it can also detect errors in logical, shift and rotate operations. This slide shows the error detection in an AND operation. we can see that berger check codes correctly predicts 3 zeroes in the AND operation output. Berger check code wins where most of the arithmetic code fails. Its applicability for a diverse set of operations. Further, the check symbol calculation basically involves counting zeroes, hence increasing the size of the ALU data width would not require significant increase in the error detection logic. For example, to count zeroes in a 16-bit ALU, a 4-bit zeroes counter is required, whereas for an 32-bit only a 5-bit counter is required. Area efficiency of the berger check code would increase with higher data sizes. Sc = Xc + Yc – (X or Y)c = – 1 = 3 Xc, Yc = No. of 0’s in data X and Y ,Cin = Carry in, Cout = Carry out Cc = No. of 0’s in internal carries, Sc = Berger Check Symbol S = ALU output Srinivasan MAPLD 2005/216

Berger Check ALU Process Flow
State Machine Memory PC ALU 0’s Counter Berger Check Calculator Comparator Internal Carries X Y Opcode Cin Cout S Sc from ALU output Sc from inputs and carries Cc Xc Yc DFF Reset Clock Re-Generate/Re-Configure Signal Error_flag_out 16 8 32 BCALU The figure shows the block diagram for a 16-bit single instruction issue berger check processor. The blocks colored in orange represent the error detection circuitry. This basically consists of a 0’s counters, a berger check calculator which is basically a multiple operand carry save adder. Then there is a comparator which generates the Re-Generate/Re-configure signals which sets the Error flag. The error flag is read during the instruction fetch stage of the state machine. Srinivasan MAPLD 2005/216

Remainder Check and Parity Check Design of EDALU

Instruction Groupings
Arithmetic Addition (ADD) Subtraction (SUB) Multiplication (MUL) Shift Shift Left Logical (SLL) Shift Right Logical (SRL) Logical Bitwise AND Bitwise OR Bitwise XOR Rotate Rotate Left (ROL) Rotate Right (ROR) Berger check scheme though has the advantage of covering all the ALU operations, but the error detection circuitry would always be sensitive to the SEUs. In order reduce the effective sensitive area, the instructions are grouped based on the nature of the operations. Four different instruction groupings were used. Arithmetic, logical, shift and rotate. Remainder check is used to detect errors in arithmetic operations. Parity check is used in logical, shift and rotate instructions. Srinivasan MAPLD 2005/216

Remainder Check for Arithmetic Instructions
Generate Remainder RX RY Data X Data Y Calculate Remainder Check from two remainders RC = f(RX, RY) Remainder Check at the ALU Inputs Remainder Check at the ALU Output Check from ALU Output Comparator Re-Generate/ Re-Configure ALU output The flowchart shows the remainder check algorithm for detecting single bit errors during arithmetic operations. The data symbols X and Y are divided by a check divisor which generates a remainder. These remainders are subjected to the same ALU operation as the data symbols X and Y. The Remainder check is calculated from the ALU output and compared. The division algorithm to calculate remainders can be quite complex. A proper choice of the check divisor can fairly simplify the division algorithm. Srinivasan MAPLD 2005/216

Example: Remainder Check for ADD Instruction
ALU Operation (ADD) Remainder Check Symbol Calculation X 1 E F 216 1 + E + F + 2 mod 15 = 2 Y + mod 15 = + 9 B S mod 15 = B This slide shows an example of the berger check prediction for an ADD instruction. X and Y represent the input data to the ALU and S is the ALU output. C represents the internal carries generated during the ALU ADD operation. All the term with subscript C represents the no. of zeroes in the corresponding terms. For example Cc represents the no. of zeroes in the internal carries, XC and YC are no. of zeroes in X and Y. Sc represents the berger check symbol calculated from carries and input data symbols. We can see that Sc correctly predicts the number of zeroes in the output S. Any error in the output would result in a mismatch of the check symbols. Single bit errors would change the ALU output which will result in a different remainder check at the output Remainder check can detect single bit errors Srinivasan MAPLD 2005/216

Parity Check for Logical and Shift-Rotate Instructions
Parity of the output determined using inputs Data X Data Y Parity of the ALU output Comparator Re-Generate/ Re-Configure ALU output For error detection in logical, shift and rotate instructions, a parity bit is calculated from the information symbols in X and Y. Parity is chosen such that the total parity of the information and the parity symbol is even. In shift instructions, the bits replaced by the shift operation would not have any effect on the output parity. In rotate instructions, the parity does not change throughout the ALU operation. By comparing the parity bits from input and O/P the Re-Generate/Re-Configure signal is set as high/low. Logical instruction group Parity bit chosen to make the total parity even Shift instruction group Truncated bits would not affect parity Rotate instruction group Parity does not change through rotate operation Srinivasan MAPLD 2005/216

Error Detection ALU with Error Correction by Instruction Reissue
Check-Sum/ Parity Calculator Decoder Comparator DFF State Machine Memory PC DFF Error_flag Re-Generate/Re-Configure Signal Comparator Enable Remainder / Parity Check from inputs State Machine Memory PC 16 ALU Remainder / Parity Check from ALU output Reset 16 The Block diagram shows the error detection processor with remainder/parity check scheme. The green block represents the error detection circuitry. Similar to the berger check prediction processor, the Re-Generate/Re-Configure signal sets the error flag. The error flag is read during the instruction fetch stage of the processor. If the error flag is set as high, the state machine goes into the re-execution stage. The program counter is not updated and the instruction and data symbols X and Y are re-executed. However, if the error flag is set again after the re-execution of the Opcode, the error detection circuitry infers this as the configuration bit error. Then the error signal can be used to reload/ scrub the configuration memory. Further, the order re-executing data and opcode, scrubbing could be changed based on the frequency of the data stream errors and configuration bit errors. The performance loss in the re-executing the data and the opcode would be dependent on the bit error rate for the circuit. If the bit error rate is small, then processor would be running at its normal speed most of the time. Only during the detection of an error the re-transmit state would be executed. In berger check processor, the error detection circuitry will always be sensitive. In ED processor, the error detection circuitry is separated for various instructions. A strike on the shift instruction parity logic would not have any effect on the output while executing arithmetic, logical or rotate instructions. The processor performance penalty and sensitivity would depend upon the application run the processor. The performance penalty can be tested by running some benchmark applications. Clock 8 32 Data Opcode ALU output EDALU State machine monitors the error signal State machine checks for error signal in the instruction fetch stage If an error is detected, PC is not updated and the same instruction is fetched again Srinivasan MAPLD 2005/216

Error Correction By Re-Transmit
Mismatch of Encoder and Decoder Output due to SEU sets the regenerate flag high Subtract Instruction Re-Executed on detection of error Srinivasan MAPLD 2005/216

Results and Discussion

FPGA Resource Utilization for ALU
TMR - TRIPLE MODE REDUNDANCY BC - BERGER CHECK ED - REMAINDER/PARITY CHECK The chart shows the FPGA resource utilization for the various ALU designs. We can see from the chart that EDALU (ALU with remainder/parity check) uses 46% of the logic elements compared to the triple mode redundant ALU and the berger check prediction ALU uses 58% of the logic elements compared to a TMR. Area of the berger check processor 1.82 times the area of the regular ALU. The area of EDALU is 1.45 times the area of the regular ALU. Area wise EDALU processor is efficient. However, EDALU processor can detect only single bit feed-forward errors, whereas the berger check and TMR can correct multiple-bit feed forward errors. Further, as we scale upward increasing the data widths to 64-bit and 128-bit, the berger check processor would scale much better compared to TMR and EDALU. EDALU requires 46% of the area of an ALU with TMR BCALU requires 58% of the area of a TMR ALU BCALU can detect multiple feed-forward errors, whereas EDALU can detect only single bit feed-forward errors Srinivasan MAPLD 2005/216

Processor Implementation
TMR - TRIPLE MODE REDUNDANCY BC - BERGER CHECK ED - REMAINDER/PARITY CHECK These charts shows the Area and Performance comparison for a single instruction issue processor implementation using the three ALUs. The Berger check processor uses 60% of the area of the TMR processor, but runs at 85% of its clock frequency. However the performance of the EDALU is affected by the sequential decoding element in the error detection circuitry. A combination of the Berger check and EDALU can be developed to optimize on the Area/Performance metrics. Single Instruction Issue BCALU processor runs at 85% of the clock frequency of TMR ALU Processor EDALU processor operates at 60% of the clock frequency of TMR ALU processor EDALU performance is affected by the sequential decoding element Srinivasan MAPLD 2005/216

Area and Delay Results ` TMR Processor ED Processor BC Processor TMR - TRIPLE MODE REDUNDANCY BC - BERGER CHECK ED - REMAINDER/PARITY CHECK Values normalized to TMR processor This figure on the right shows the constant area delay contour for the implementation of the three ALUs in a single instruction issue processor. As we move in the direction of the arrow the area and delay penalty reduces. Based on the resource allocation restriction the user can choose between different techniques. For example with 3500 logic elements, the designer can choose TMR ALU due to its ease of implementation and the minimum performance penalty. For smaller logic element resource, the user may choose between berger check and EDALU based on the bit error rate and the no. of soft errors to be eliminated. Area-delay product of a Berger Check Processor is 71% of the TMR processor Berger Check Processor achieves the error correction at reduced area penalty Large data size Berger Check ALU would not require significant increase in error detection logic Srinivasan MAPLD 2005/216

Discussion Re-issuing instruction and data to recover from SEU would affect processor performance Performance penalty for re-generate is dependent on the bit error rate Errors in configuration bits would affect the functionality of the circuit which would set the re-generate/re-configure flag Scrubbing can be done on demand to correct the SEUs in configuration bits Re-issuing the instruction and data would affect the system performance. However, it is dependent how often the errors occur. For error rates of 1-2 errors bit/day, the resource utilization of a TMR based processor would inefficient. With a slight compromise in performance by re-executing opcode and instruction the errors can be eliminated at the instant of detection. Further for configuration bit errors, the scrubbing can be done on demand when the error is detected. However, cost benefit of scrubbing on demand over scrubbing periodically is to be determined. Future work is aimed at determining this cost benefit. Srinivasan MAPLD 2005/216

Summary Triple Mode Redundancy and Scrubbing to correct errors in FPGAs involve huge penalties Built-In-Error-Detection can be used to detect errors in data as well as configuration bits Berger Check error detection minimizes penalties and scales better for higher data widths Area Delay Product of Berger Check Single Instruction Issue processor is 70% of the Area Delay Product of a TMR processor To Summarize, TMR and scrubbing to correct errors involve penalties. BIED techniques can be used to detect errors in data bits as well as configuration bits. Among the three different schemes studied, berger check error detection minimizes the penalties and also scales well as we move to larger bit size. Area delay product of a berger check processor is 70% of the TMR processor. Srinivasan MAPLD 2005/216

Future Work Running Benchmark applications to estimate the penalty in re-generating the instruction/data ALU Design with combination of Berger Check and Remainder-Parity Check to optimize area and performance penalties Future work aims at determining the cost benefit of error detection and re-configuration on demand. Running some benchmark applications to determine the penalties for a given bit error rate. Analyze other design possibilities, may be a combination of EDALU and BCALU. Thank You. Srinivasan MAPLD 2005/216

Varadarajan Srinivasan, Julian W. Farquharson,

Similar presentations

Presentation on theme: "Varadarajan Srinivasan, Julian W. Farquharson,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Varadarajan Srinivasan, Julian W. Farquharson,

Similar presentations

Presentation on theme: "Varadarajan Srinivasan, Julian W. Farquharson,"— Presentation transcript:

Similar presentations

About project

Feedback