1/14 Merging BIST and Configurable Computing Technology to Improve Availability in Space Applications Eduardo Bezerra 1, Fabian Vargas 2, Michael Paul.

Slides:

Advertisements

Similar presentations

Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs

Advertisements

Digital Integrated Circuits© Prentice Hall 1995 Design Methodologies Design for Test.

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.

Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 261 Lecture 26 Logic BIST Architectures n Motivation n Built-in Logic Block Observer (BILBO) n Test.

Scrubbing Approaches for Kintex-7 FPGAs

Fault-Tolerant Systems Design Part 1.

HPEC 2012 Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing Quinn Martin Alan George.

Complex Upset Mitigation Applied to a Re-Configurable Embedded Processor EEL 6935 Lu Hao Wenqian Wu.

Circuit Modeling and Fault Injection Approach to Predict SEU Rate and MTTF in Complex Circuits Fabian Vargas, Alexandre Amory Catholic.

ICAP CONTROLLER FOR HIGH-RELIABLE INTERNAL SCRUBBING Quinn Martin Steven Fingulin.

FAULT TOLERANCE IN FPGA BASED SPACE-BORNE COMPUTING SYSTEMS Niharika Chatla Vibhav Kundalia

Reconfigurable Computers in Space: Problems, Solutions and Future Directions Neil W. Bergmann, Anwar S. Dawood CRC for Satellite Systems Queensland University.

Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani.

BIST for Logic and Memory Resources in Virtex-4 FPGAs Sachin Dhingra, Daniel Milton, and Charles Stroud Electrical and Computer Engineering Auburn University.

Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.

Fault-Tolerance in VHDL Description: Transient-Fault Injection & Early Reliability Estimation TIMA-INPG Lab Fabian Vargas, Alexandre Amory Raoul Velazco.

2. Introduction to Redundancy Techniques Redundancy Implies the use of hardware, software, information, or time beyond what is needed for normal system.

Embedded Systems Laboratory Informatics Institute Federal University of Rio Grande do Sul Porto Alegre – RS – Brazil SRC TechCon 2005 Portland, Oregon,

7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.

SCORPION : A New Approach to Design Reliable Real-Time Speech Recognition Systems F. Vargas, R. D. Fagundes, D. Barros Jr. Catholic.

Configuration. Mirjana Stojanovic Process of loading bitstream of a design into the configuration memory. Bitstream is the transmission.

Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design Vishwani.

1 © Unitec New Zealand Embedded Hardware ETEC 6416 Date: - 10 Aug,2011.

Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.

Adopting Multi-Valued Logic for Reduced Pin-Count Testing Baohu Li, Bei Zhang and Vishwani Agrawal Auburn University, ECE Dept., Auburn, AL 36849, USA.

A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.

CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION

CS1Q Computer Systems Lecture 11 Simon Gay. Lecture 11CS1Q Computer Systems - Simon Gay2 The D FlipFlop A 1-bit register is called a D flipflop. When.

Reconfiguration Based Fault-Tolerant Systems Design - Survey of Approaches Jan Balach, Jan Balach, Ondřej Novák FIT, CTU in Prague MEMICS 2010.

Computer Engineering Group Brandenburg University of Technology at Cottbus 1 Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in VLIW-Processors.

J. Christiansen, CERN - EP/MIC

Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.

ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTEMS

FT-UNSHADES Analysis of SEU effects in Digital Designs for Space Gioacchino Giovanni Lucia TEC-EDM, MPD - 8 th March Phone: +31.

2/2/2009 Marina Artuso LHCb Electronics Upgrade Meeting1 Front-end FPGAs in the LHCb upgrade The issues What is known Work plan.

Fault-Tolerant Systems Design Part 1.

MAPLD 2005/202 Pratt1 Improving FPGA Design Robustness with Partial TMR Brian Pratt 1,2 Michael Caffrey, Paul Graham 2 Eric Johnson, Keith Morgan, Michael.

Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.

Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.

Experimental Evaluation of System-Level Supervisory Approach for SEFIs Mitigation Mrs. Shazia Maqbool and Dr. Craig I Underwood Maqbool 1 MAPLD 2005/P181.

MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.

CprE 458/558: Real-Time Systems

CHAPTER-2 Fundamentals of Digital Logic. Digital Logic Digital electronic circuits are used to build computer hardware as well as other products (digital.

FTC (DS) - V - TT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 5 FAULT RECOVERY AND TOLERANCE TECHNIQUES (SYSTEM.

Final Presentation DigiSat Reliable Computer – Multiprocessor Control System, Part B. Niv Best, Shai Israeli Instructor: Oren Kerem, (Isaschar Walter)

Fault-Tolerant Systems Design Part 1.

Sequential logic circuits

Evaluating Logic Resources Utilization in an FPGA-Based TMR CPU

1 CzajkowskiMAPLD 2005/138 Radiation Hardened, Ultra Low Power, High Performance Space Computer Leveraging COTS Microelectronics With SEE Mitigation D.

Eduardo L. Rhod, Álisson Michels, Carlos A. L. Lisbôa, Luigi Carro ETS 2006 Fault Tolerance Against Multiple SEUs using Memory-Based Circuits to Improve.

In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.

Self-Tuned Distributed Multiprocessor System Xiaoyan Bi CSC Operating Systems Dr. Mirela Damian.

Paper by F.L. Kastensmidt, G. Neuberger, L. Carro, R. Reis Talk by Nick Boyd 1.

Chandrasekhar 1 MAPLD 2005/204 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan.

MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.

Chapter 5 - Internal Memory 5.1 Semiconductor Main Memory 5.2 Error Correction 5.3 Advanced DRAM Organization.

Presenter: Darshika G. Perera Assistant Professor

Electrical Engineering Dept.

CFTP ( Configurable Fault Tolerant Processor )

SEU Mitigation Techniques for Virtex FPGAs in Space Applications

Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

CPE/EE 428/528 VLSI Design II – Intro to Testing (Part 3)

Sequential circuits and Digital System Reliability

BIC 10503: COMPUTER ARCHITECTURE

Jian Huang, Matthew Parris, Jooheung Lee, and Ronald F. DeMara

Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.

Xilinx Kintex7 SRAM-based FPGA

Lecture 26 Logic BIST Architectures

Presentation transcript:

1/14 Merging BIST and Configurable Computing Technology to Improve Availability in Space Applications Eduardo Bezerra 1, Fabian Vargas 2, Michael Paul Gough 3 1, 3 Space Science Centre, University of Sussex, Brighton, BN1 9QT, England 1, 2 Catholic University - PUCRS, Porto Alegre - Brazil 1st IEEE Latin American test Workshop - LATW’00. Marina Palace Hotel, March 13-15, Rio de Janeiro, Brazil Electrical Engineering Dept. Catholic University - PUCRS Porto Alegre, Brazil Space Science Centre School of Engineering University of Sussex, England

2/14 Agenda 1. Motivation: Important concerns about the design of reconfigurable systems for space applications 2. System Description Overview 3. SEU Prevention Strategies 3.1. Refresh Operation in a TMR-FPGA System 3.2. Periodic Refresh Without FPGA Replication 3.3. Signature Analysis-Driven Refresh Without FPGA Replication 3.4. Signature Analysis With Continuous Readback Execution 4. Masking Connectivity Faults 5. Numerical Analysis of the CCM Node in Two Modes of Operation 6. Expected Performance 7. Conclusions & Future Work FPGA

3/14 Important concerns of computer designers for space applications : Power computation, area usage, weight, and dependability (availability, reliability, and testability). Main Characteristics & Drawbacks : application-specific systems (requirements change frequently from application to application) :  very expensive systems ! Possible Solution : use of configurable devices :  allows the designers to have different HW configurations adequate for every new application, without the need for changes in the whole board layout (application-dependent solution). Drawback : SW development for this kind of HW is in most cases very difficult (e.g., complex data structure). In the past few years : many approaches devoted to improve dependability features of reconfigurable computer systems mainly based on traditional strategies (i.e., microprocessor based systems). 1. Motivation:

4/14 Fig. 1. Illustration of the charge collection mechanism that causes single-event upset : (a) particle strike and charge generation; (b) current pulse shape generated in the n+p junction during the collection of the charge.  Radiation causes Single-Event Upset (SEU) in memory elements:  Processor latches and cache mem. cells are sensitive to SEUs  FPGAs store logic/routing in latches. 1. Motivation:

5/14 Fig. 2. Block diagram of the proposed system : (a) Network architecture. (b) Basic CCM node. 2. System Description Overview :

6/14 3. SEU Prevention Strategies 3.1. Refresh Operation in a Triple Modular Redundancy (TMR) FPGA System Fig. 3. A TMR FPGA system. - 3 FPGAs configured with the same bitstream (TMR) and operate in synchronism. - A controller reads the 3 FPGA bitstream, bit after bit, and if there are no differences, then a correct functioning with no SEU occurrence is assumed. - Executed continuously (FPGAs readback feature, during normal FPGA operation). Drawbacks : - HW overhead (TMR), - Total loss of data measurement.

7/14 3. SEU Prevention Strategies 3.2. Periodic Refresh Without FPGA Replication Fig. 4. Using a counter to start the refresh operation. - A 15Hz clock increments the 19-bit counter, - At every 20 hoours, the coutner resets, which leads to FPGA reconfiguration. Drawback: refresh periodically, even if there are no SEU occurrence (system availability may be seriously affected).

8/14 3. SEU Prevention Strategies 3.3. Signature Analysis-Driven Refresh Without FPGA Replication Fig. 5. The LFSR/PSG approach. - LFSR/PSG process created in VHDL  2 operating modes : (a) LFSR mode  15Hz clock signal (19-bit LFSR -prim. polynomial- counts up to 20 h.) When the LFSR output matches a given seed: (b) PSG mode, the LFSR/PSG process  at speed (parallel signature generator) Drawback: - HW required slightly higher them in the previous clock/counter approach. A signature analysis (LFSR/PSG) method is used to identify when an FPGA refresh is necessary.

9/14  In the previous strategy, the test for SEU occurrences is executed periodically. The LFSR is used to start the readback operation and to compact the configuration bitstream time after time.  Another option for the test is to execute the readback continuously, as it does not affect the normal FPGA operation.  Advantage: optimize HW overhead (part of the LFSR/PSG process is useless: the internal 15 Hz clock used to “start readback” process on FPGA A, and the circuit used for the clock signal switching, are eliminated).  Alternatively, the 15Hz clock could be used, in a different process to control the FPGA B self-refreshing activity.  This strategy saves space on FPGA B and allows the integrity of FPGA A to be verified more frequently.  Drawback: power consumption is slightly larger than the LFSR/PSG approach due to the continuous readback operation of FPGA A. 3. SEU Prevention Strategies 3.4. Signature Analysis With Continuous Readback Execution

10/14 4. Masking Connectivity Faults Fig. 6. Using replicated inputs/voter to mask connectivity faults. Reliability improvements in the processing elements is worthless if the input data correction is not guaranteed. Goal: mask faults in the external FPGA pins and in the internal FPGA routing resources. Sensor 1  Sensor 2  Sensor 3 

11/14 5. Numerical Analysis of the CCM Node in Two Modes of Operation First situation: the 3 flash memories hold 3 different configuration bitstreams (CBs). - This scenario represents a real reconfigurable computing system, because the FPGA functionality can be altered, on-the-fly, according to the application requirements. - From the fault-tolerance point of view it is not a good approach as, in case of an SEU occurrence in one of the flash memories, the respective application has to stop, and wait for a good CB be up-loaded from the ground station. Second situation, the 3 flash memories hold the same CB, which characterises a TMR system. The vote is executed, implicitly, by FPGA B. - This test strategy is not capable of fault location: then, it is not possible to identify if the problem was in the flash memory or in the FPGA. - In any case, the FPGA A is reconfigured with a CB from another flash memory. If the error persists, then the diagnostic is a permanent fault in FPGA A, and the module has to be by-passed. On the other hand, if with the new CB no error is detected, then the respective flash memory is considered faulty, and it needs to be refreshed in order to try to clear any occurrence of SEUs.

12/14 Fig. 7. The reliability responses for the two situations. 5. Numerical Analysis of the CCM Node in Two Modes of Operation

13/14 6. Expected Performance Table 1. Performance comparison for the case study (clock cycles). DS87C520 [8051 family] (Assembly) X FPGA (VHDL) Application Program: auto-correlation (ACF) processing of particle count pulses as a means of studing processes occurring in near Earth plasmas.

14/14 7. Conclusions & Future Work  This paper introduced the use of a BIST technique and traditional fault- tolerance strategies together with configurable computing technology to improve the availability of on-board computers used in space applications.  network architecture for spacecraft instruments was presented;  test and fault-tolerance strategies to detect and fix/tolerate SEU occurrences were analysed;  a technique to mask connectivity faults was also proposed;  expected strategy performance was estimated. The strategies described here deserve a deeper investigation, in order to be used in the design of a fault-tolerant on-board instrument processing system, entirely based on configurable computing. The next step will be the implementation of a prototype to determine the feasibility of the test and fault-tolerant strategies proposed here.