MAPLD 2009 August 31 - September 3, 2009 SPFFI: Simple, Portable FPGA Fault Injector Grzegorz Cieslewski Ph.D. Student NSF CHREC Center, University of.

Slides:



Advertisements
Similar presentations
Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
Advertisements

Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.
Run-Time FPGA Partial Reconfiguration for Image Processing Applications Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross.
Scrubbing Approaches for Kintex-7 FPGAs
Fault-Tolerant Systems Design Part 1.
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
EECE499 Computers and Nuclear Energy Electrical and Computer Eng Howard University Dr. Charles Kim Fall 2013 Webpage:
ICAP CONTROLLER FOR HIGH-RELIABLE INTERNAL SCRUBBING Quinn Martin Steven Fingulin.
Alternate Software Development Methodologies
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Fault-Tolerant Softcore Processors Part I: Fault-Tolerant Instruction Memory Nathaniel Rollins Brigham Young University.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
Week 1- Fall 2009 Dr. Kimberly E. Newman University of Colorado.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
Evaluating Hypotheses
SIMULATION. Simulation Definition of Simulation Simulation Methodology Proposing a New Experiment Considerations When Using Computer Models Types of Simulations.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
Using A Defined and Measured Personal Software Process Watts S. Humphrey CS 5391 Article 8.
Nadpis 1 Nadpis 2 Nadpis 3 Jméno Příjmení Vysoké učení technické v Brně, Fakulta informačních technologií v Brně Božetěchova 2, Brno
Reduced Cost Reliability via Statistical Model Detection Jon-Paul Anderson- PhD Student Dr. Brent Nelson- Faculty Dr. Mike Wirthlin- Faculty Brigham Young.
Software Reliability SEG3202 N. El Kadri.
A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
J. Christiansen, CERN - EP/MIC
FT-UNSHADES Analysis of SEU effects in Digital Designs for Space Gioacchino Giovanni Lucia TEC-EDM, MPD - 8 th March Phone: +31.
Brian Macpherson Ph.D, Professor of Statistics, University of Manitoba Tom Bingham Statistician, The Boeing Company.
Fault-Tolerant Systems Design Part 1.
MAPLD 2005/202 Pratt1 Improving FPGA Design Robustness with Partial TMR Brian Pratt 1,2 Michael Caffrey, Paul Graham 2 Eric Johnson, Keith Morgan, Michael.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Design Framework for Partial Run-Time FPGA Reconfiguration Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Software Testing and Quality Assurance Software Quality Assurance 1.
Experimental Evaluation of System-Level Supervisory Approach for SEFIs Mitigation Mrs. Shazia Maqbool and Dr. Craig I Underwood Maqbool 1 MAPLD 2005/P181.
Functional Verification of Dynamically Reconfigurable Systems Mr. Lingkan (George) Gong, Dr. Oliver Diessel The University of New South Wales, Australia.
Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. 1.
CprE 458/558: Real-Time Systems
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Fault-Tolerant Systems Design Part 1.
1 OUTPUT ANALYSIS FOR SIMULATIONS. 2 Introduction Analysis of One System Terminating vs. Steady-State Simulations Analysis of Terminating Simulations.
McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
This material exempt per Department of Commerce license exception TSU Xilinx On-Chip Debug.
Evaluating Logic Resources Utilization in an FPGA-Based TMR CPU
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Aerospace Conference ‘12 A Framework to Analyze, Compare, and Optimize High-Performance, On-Board Processing Systems Nicholas Wulf Alan D. George Ann Gordon-Ross.
Software Quality Assurance and Testing Fazal Rehman Shamil.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
LECTURE 5 Nangwonvuma M/ Byansi D. Components, interfaces and integration Infrastructure, Middleware and Platforms Techniques – Data warehouses, extending.
Chandrasekhar 1 MAPLD 2005/204 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan.
Week#3 Software Quality Engineering.
Kandemir224/MAPLD Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering.
Presenter: Darshika G. Perera Assistant Professor
MAPLD 2005 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan Dr. V. Kamakoti.
Ming Liu, Wolfgang Kuehn, Zhonghai Lu, Axel Jantsch
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Anne Pratoomtong ECE734, Spring2002
M. Aguirre1, J. N. Tombs1, F. Muñoz1, V. Baena1, A. Torralba1, A
Abelardo Jara-Berrocal Joseph Antoon Ph.D. Students
Software life cycle models
Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida
Xilinx Kintex7 SRAM-based FPGA
Presentation transcript:

MAPLD 2009 August 31 - September 3, 2009 SPFFI: Simple, Portable FPGA Fault Injector Grzegorz Cieslewski Ph.D. Student NSF CHREC Center, University of Florida Dr. Alan D. George Professor of ECE NSF CHREC Center, University of Florida

2 Outline Goals and Motivations Existing FPGA Fault-Injection Technology SPFFI Architecture and Capabilities Methodology of Fault Injection Fault Injection and Statistics Case Studies  MicroBlaze – System-Level Testing  LIDAR – Module-Level Testing Conclusions and Future Work

3 Goals and Motivations Goals – Explore methodologies and procedures for effectively combining fault-testing methods and concepts  Verification of fault-tolerant FPGA architectures and designs for space and terrestrial applications  Portable solution that can be used on COTS as well as specialized FPGA platforms Motivations – FT testing of large FPGA-based designs can be very difficult  Why wait for fault testing until completion of system?  Beam testing is very expensive and lacks coverage  Developers need to be able to perform testing without specialized hardware to estimate reliability  Enabling technology for further research Challenges – Tool limitations and shortcomings of FPGA architecture  Restricted access to dynamic components (BRAM)  Slow programming interconnect

4 Existing FPGA Fault-Injection Tech. Custom hardware solutions  XRTC fault-injection board  High injection speed and excellent coverage  Limited portability and difficult in-system testing Beam testing  Most accurately resembles space conditions  Very costly High time and financial overhead  Few facilities which provide the service  Limited experiment repeatability Manual injection  Requires designer to manually insert faults in HDL Can interfere with design at hand Difficult to implement  Cannot accurately emulate types of faults expected

5 SPFFI – Simple Portable FPGA Fault Injector Introduction  New fault-injection utility designed for easy and portable use  Supports multiple Virtex-4 FPGA platforms with Virtex-5 and -6 support coming in near future System Model  PC connected to FPGA-based sys. Prog. Interface: JTAG (most popular) Test interface: USB, PCI, Serial, … SPFFI consists of 3 major components  SPFFI Engine Responsible for fault injection and result collection  Campaign Generator Crafts injection campaigns based on parameters specified by user  Test Generator Plug-in component for customizable FPGA testing

6 SPFFI Components SPFFI Engine  Campaign Manager Orchestrates execution of campaign as specified in input file  Bitstream parser Analyses input bitstreams to extract relevant architectural and design information  Bitstream generator Crafts customized full and partial bitstreams to allow for fault injection and removal  JTAG interface engine Provides high-level programming interface for variety of JTAG connections  Logging Engine Stores events and injection results in a database Test Generator  Plug-in application that verifies correct operation of design  User-defined for maximum flexibility  Can be partially hosted on FPGA to speed up testing

7 SPFFI Components Campaign Generator  Generates locations at which faults will be injected by SPFFI Engine Parameters: injection count, resource type, transition type, occupied vs. unoccupied frames, campaign type  Campaign Types Uniform campaign  Selects random injection locations anywhere on chip Automatically targeted campaign  Based on coarse bitstream analysis; chip is divided into two mutually exclusive occupied and unoccupied regions  Two campaigns are performed, one for each region  Vast majority of observable errors are due to faults injected into occupied region Manually targeted campaign  Region of interest specified by user

8 Fault Injection Methodology Fault-injection considerations  Correctness Minimize false positives Representative of how SEUs and SEFIs occur  Performance Higher performance will provide better estimate of error rate Test Generator upshots  No observable error (benign fault) Triggers fault removal via partial reconfiguration  Injected fault produces error FPGA produces invalid or no output Data error (comparison with golden standard fails) Triggers return to original state via full reconfiguration Site is re-tested to eliminate bias introduced by partial reconfigurations

9 Testing Methodology System Types  Module-Level Testing Designed to test standalone modules of a system Data is provided and collected from module by a wrapper design residing on same chip Test data can be provided by Test Generator or can reside on FPGA to increase testing speed.  System-Level Testing Used for testing of systems which interact with external hardware  System-on-chip type scenarios  System with FPGA co-processor Test Generator is used to issue commands for starting and stopping testing  Hybrid Testing Combination of above approaches Used for integration & incremental testing

10 SPFFI Performance JTAG performance  Strongly dependent upon JTAG Engine backend Up to 2 injections/s when using iMPACT Up to 12 injections/s when using UrJTAG Based on Virtex-4 LX25  Different backends possible  Limited by poor performance of JTAG cables and software overhead Test Generator performance  Generating representative set of test vectors is difficult Representative set might be very large and require long testing times.  Varies depending on implementation Physical location of test vectors Error rate  Less than 10% for most of designs  Injection speed generally increases for designs with lower error rates Fewer full reconfigurations to restore known state Programming TypeiMPACTUrJTAG Full Configuration~ Partial Reconfiguration

11 Fault Injection and Statistics Designing fault injection experiments with statistic in mind  Coverage of sensitivity testing Full testing vs. partial testing  Select a representative bits in region of interest Random sampling Stratified sampling  Interpretation of results Are results obtained representative of whole design? Confidence intervals  Interval which contains true value of parameter with certain probability  Point estimate of parameter Does not quantify the quality or range of data collected Fault-injection testing can be viewed as series of independent Bernoulli trials  Two possible outcomes Success – no observable error Failure – injection causes observable error

12 Calculation of Confidence Interval Chebyshev’s inequality  One of simplest ways to obtain a confidence interval  Used when variance of estimator (theta) is known or can be easily estimated  Prior knowledge of distribution type is not required  Quality of bounds can be improved if underlying distribution is known Binominal estimation of confidence interval for error rate  In case of FI we know underlying distribution Sum of multiple Bernoulli trials is binomially distributed  More difficult to calculate as no closed form solution exists  Yields bounds which are much tighter than using Chebyshev’s inequality Using similar approach, it is possible to calculate number of trials required to obtain certain confidence interval

13 MicroBlaze Case Study Soft-core processors are often used as computational resources on FPGA systems  What error rate can we expect without FT mitigation? Experiment Setup  SoC consists of one MicroBlaze with FPU  Employs system-level testing methodology  10,000 experiments performed for each campaign Test Generator  Two popular linear-algebra benchmarks Matrix Multiply LU Decomposition Results  Occupied Frames LU error rate: 4.19% with 99% CI of [3.67%, 4.72%] MM error rate: 4.02% with 99% CI of [3.51%, 4.54%]  Unoccupied Frames LU error rate: 0.14% with 99% CI of [0.04%, 0.25%] MM error rate: 0.09% with 99% CI of [0.01%, 0.19%]

14 LIDAR Case Study LIDAR (Light Detection and Ranging)  Widely used in remote sensing for terrain mapping  Susceptible to upsets due to hazardous operating environment SCP used as error mitigation technique  Coordinate Calculation phase constructs 3D information of targets based on set of LIDAR parameters Each laser return is processed independently Experimental Setup  Similar hardware setup to previous case study  Employs module-level testing methodology  test vectors per injection trail Results (undetected errors)  No FT error rate: 5.18% with 99% CI of [4.61%, 5.77%]  SCP error rate: 0.39% with 99% CI of [0.23%,0.57%] LIDAR Parameters ρ, θ, φ r, φ p, φ y, X ac, Y ac, Z ac Coordinate Calculation (CC) Sinusoidal Evaluation Vector Rotation & Translation Target Coordinate X, Y, Z

15 Conclusions and Future Work Conclusions  SPFFI – new and innovative fault-injection system Simplifies fault testing Does not require specialized hardware Fills niche where other methods fall short  Proposed a methodology for fault-injection testing Module- and system-level testing Use of combination of partial and full reconfiguration  Complete testing is not always needed Confidence intervals can provide answers which are close enough for development testing  Demonstrated multiple case studies exemplifying use of SPFFI Future Work  Support for Virtex-5 and Virtex-6 devices  Augment BRAM injection capabilities  Increase fault-injection speed with improved JTAG interface

16 QUESTIONS? This work was supported in part by the I/UCRC Program of the National Science Foundation under Grant No. EEC We also gratefully acknowledge tools provided by Xilinx.

MAPLD 2009 August 31 - September 3, 2009 Backup Slides: RFT Case Study

18 PRR3PRR2PRR1 RFT Controller Motivations for RFT  Dynamically vary fault-tolerance levels depending on external stimuli  Obtain high reliability during critical moments while maximizing performance (or minimizing power) at other times Partial Reconfiguration enables system flexibility  Ability to move Partial Reconfiguration Modules (PRM) around to different Partial Reconfiguration Regions (PRR)  Ability to modify level of fault tolerance in a single PRM  Ability to add multiple PRMs to increase fault tolerance through replication Possible FT approaches for RFT components  Coarse-Grained Replication (SCP, TMR)  Algorithm-Based Fault Tolerance (ABFT)  Error Correcting Codes (ECC)  FT-HLL through source-to-source translation APP1APP2APP3 Reconfigurable Fault Tolerance PLB Bus MicroBlaze ICAP Flash controller UART USB APP1 Controller + TMR Voter

19 RFT Case Study ABFT designs detected >70% of errors from original design  Hybrid design had even higher reliability ABFT and Hybrid designs approach reliability of traditional TMR  However, TMR automatically provides correction  ABFT requires additional hardware and cycles to correct erroneous data Two ABFT approaches explored  Matrix Multiplication (MM)  Linear Transform (LT)  Algorithm details in appendix Baseline non-FT designs compared to multiple FT approaches  ABFT - Original VHDL design with additional ABFT components  TMR – Triplicated version of original  Hybrid – ABFT design with key components triplicated Design reliability tested with SPFFI  Single-bit errors injected into configuration memory Focus on LUT and routing resources  Errors recorded and categorized for later analysis Overhead (Slice / BRAM / Cycles) Undetected Errors (%) MM – Original % MM – ABFT 184% / 0% / 40%0.76% MM – Hybrid 196% / 0% / 40%0.47% MM – TMR 242% / 200% / 0%0.66% LT – Original % LT – ABFT 149% / 100% / 5%0.88% LT – Hybrid 215% / 100% / 5%0.53% LT – TMR 284% / 200% / 0%0.35% LT – TMR created using BYU’s EDIF-based TMR tool MM – TMR created using VHDL-level replication