Download presentation
Presentation is loading. Please wait.
Published byKathleen Clarke Modified over 9 years ago
1
1/14 Merging BIST and Configurable Computing Technology to Improve Availability in Space Applications Eduardo Bezerra 1, Fabian Vargas 2, Michael Paul Gough 3 1, 3 Space Science Centre, University of Sussex, Brighton, BN1 9QT, England E.A.Bezerra@sussex.ac.uk, M.P.Gough@sussex.ac.uk 1, 2 Catholic University - PUCRS, 90619-900 Porto Alegre - Brazil vargas@computer.org 1st IEEE Latin American test Workshop - LATW’00. Marina Palace Hotel, March 13-15, 2000. Rio de Janeiro, Brazil Electrical Engineering Dept. Catholic University - PUCRS Porto Alegre, Brazil Space Science Centre School of Engineering University of Sussex, England
2
2/14 Agenda 1. Motivation: Important concerns about the design of reconfigurable systems for space applications 2. System Description Overview 3. SEU Prevention Strategies 3.1. Refresh Operation in a TMR-FPGA System 3.2. Periodic Refresh Without FPGA Replication 3.3. Signature Analysis-Driven Refresh Without FPGA Replication 3.4. Signature Analysis With Continuous Readback Execution 4. Masking Connectivity Faults 5. Numerical Analysis of the CCM Node in Two Modes of Operation 6. Expected Performance 7. Conclusions & Future Work FPGA
3
3/14 Important concerns of computer designers for space applications : Power computation, area usage, weight, and dependability (availability, reliability, and testability). Main Characteristics & Drawbacks : application-specific systems (requirements change frequently from application to application) : very expensive systems ! Possible Solution : use of configurable devices : allows the designers to have different HW configurations adequate for every new application, without the need for changes in the whole board layout (application-dependent solution). Drawback : SW development for this kind of HW is in most cases very difficult (e.g., complex data structure). In the past few years : many approaches devoted to improve dependability features of reconfigurable computer systems mainly based on traditional strategies (i.e., microprocessor based systems). 1. Motivation:
4
4/14 Fig. 1. Illustration of the charge collection mechanism that causes single-event upset : (a) particle strike and charge generation; (b) current pulse shape generated in the n+p junction during the collection of the charge. Radiation causes Single-Event Upset (SEU) in memory elements: Processor latches and cache mem. cells are sensitive to SEUs FPGAs store logic/routing in latches. 1. Motivation:
5
5/14 Fig. 2. Block diagram of the proposed system : (a) Network architecture. (b) Basic CCM node. 2. System Description Overview :
6
6/14 3. SEU Prevention Strategies 3.1. Refresh Operation in a Triple Modular Redundancy (TMR) FPGA System Fig. 3. A TMR FPGA system. - 3 FPGAs configured with the same bitstream (TMR) and operate in synchronism. - A controller reads the 3 FPGA bitstream, bit after bit, and if there are no differences, then a correct functioning with no SEU occurrence is assumed. - Executed continuously (FPGAs readback feature, during normal FPGA operation). Drawbacks : - HW overhead (TMR), - Total loss of data measurement.
7
7/14 3. SEU Prevention Strategies 3.2. Periodic Refresh Without FPGA Replication Fig. 4. Using a counter to start the refresh operation. - A 15Hz clock increments the 19-bit counter, - At every 20 hoours, the coutner resets, which leads to FPGA reconfiguration. Drawback: refresh periodically, even if there are no SEU occurrence (system availability may be seriously affected).
8
8/14 3. SEU Prevention Strategies 3.3. Signature Analysis-Driven Refresh Without FPGA Replication Fig. 5. The LFSR/PSG approach. - LFSR/PSG process created in VHDL 2 operating modes : (a) LFSR mode 15Hz clock signal (19-bit LFSR -prim. polynomial- counts up to 20 h.) When the LFSR output matches a given seed: (b) PSG mode, the LFSR/PSG process at speed (parallel signature generator) Drawback: - HW required slightly higher them in the previous clock/counter approach. A signature analysis (LFSR/PSG) method is used to identify when an FPGA refresh is necessary.
9
9/14 In the previous strategy, the test for SEU occurrences is executed periodically. The LFSR is used to start the readback operation and to compact the configuration bitstream time after time. Another option for the test is to execute the readback continuously, as it does not affect the normal FPGA operation. Advantage: optimize HW overhead (part of the LFSR/PSG process is useless: the internal 15 Hz clock used to “start readback” process on FPGA A, and the circuit used for the clock signal switching, are eliminated). Alternatively, the 15Hz clock could be used, in a different process to control the FPGA B self-refreshing activity. This strategy saves space on FPGA B and allows the integrity of FPGA A to be verified more frequently. Drawback: power consumption is slightly larger than the LFSR/PSG approach due to the continuous readback operation of FPGA A. 3. SEU Prevention Strategies 3.4. Signature Analysis With Continuous Readback Execution
10
10/14 4. Masking Connectivity Faults Fig. 6. Using replicated inputs/voter to mask connectivity faults. Reliability improvements in the processing elements is worthless if the input data correction is not guaranteed. Goal: mask faults in the external FPGA pins and in the internal FPGA routing resources. Sensor 1 Sensor 2 Sensor 3
11
11/14 5. Numerical Analysis of the CCM Node in Two Modes of Operation First situation: the 3 flash memories hold 3 different configuration bitstreams (CBs). - This scenario represents a real reconfigurable computing system, because the FPGA functionality can be altered, on-the-fly, according to the application requirements. - From the fault-tolerance point of view it is not a good approach as, in case of an SEU occurrence in one of the flash memories, the respective application has to stop, and wait for a good CB be up-loaded from the ground station. Second situation, the 3 flash memories hold the same CB, which characterises a TMR system. The vote is executed, implicitly, by FPGA B. - This test strategy is not capable of fault location: then, it is not possible to identify if the problem was in the flash memory or in the FPGA. - In any case, the FPGA A is reconfigured with a CB from another flash memory. If the error persists, then the diagnostic is a permanent fault in FPGA A, and the module has to be by-passed. On the other hand, if with the new CB no error is detected, then the respective flash memory is considered faulty, and it needs to be refreshed in order to try to clear any occurrence of SEUs.
12
12/14 Fig. 7. The reliability responses for the two situations. 5. Numerical Analysis of the CCM Node in Two Modes of Operation
13
13/14 6. Expected Performance Table 1. Performance comparison for the case study (clock cycles). DS87C520 [8051 family] (Assembly) X FPGA (VHDL) Application Program: auto-correlation (ACF) processing of particle count pulses as a means of studing processes occurring in near Earth plasmas.
14
14/14 7. Conclusions & Future Work This paper introduced the use of a BIST technique and traditional fault- tolerance strategies together with configurable computing technology to improve the availability of on-board computers used in space applications. network architecture for spacecraft instruments was presented; test and fault-tolerance strategies to detect and fix/tolerate SEU occurrences were analysed; a technique to mask connectivity faults was also proposed; expected strategy performance was estimated. The strategies described here deserve a deeper investigation, in order to be used in the design of a fault-tolerant on-board instrument processing system, entirely based on configurable computing. The next step will be the implementation of a prototype to determine the feasibility of the test and fault-tolerant strategies proposed here.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.