Ronald F. DeMara, Carthik A. Sharma University of Central Florida A Combinatorial Group Testing Method A Combinatorial Group Testing Method for FPGA Fault.

Slides:



Advertisements
Similar presentations
Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
Advertisements

Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 261 Lecture 26 Logic BIST Architectures n Motivation n Built-in Logic Block Observer (BILBO) n Test.
Fault-Tolerant Systems Design Part 1.
Towards Self-Testing in Autonomic Computing Systems Tariq M. King, Djuradj Babich, Jonatan Alava, and Peter J. Clarke Software Testing Research Group Florida.
Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Optimizing Dynamic.
NATIONAL INSTITUTE OF SCIENCE & TECHNOLOGY Presented by: Susman Das Technical Seminar Presentation FPAA for Analog Circuit Design Presented by Susman.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
FPGA Defect Tolerance: Impact of Granularity Anthony YuGuy Lemieux December 14, 2005.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Yinglei Wang, Wing-kei Yu, Sarah Q. Xu, Edwin Kan, and G. Edward Suh Cornell University Tuan Tran.
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Technical University Tallinn, ESTONIA Overview: Fault Simulation Overview about methods Low (gate) level methods Parallel fault simulation Deductive fault.
1 Reasons for parallelization Can we make GA faster? One of the most promising choices is to use parallel implementations. The reasons for parallelization.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
Genetic Algorithm.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
Rawad N. Al-Haddad, Carthik A. Sharma, Ronald F. DeMara University of Central Florida Performance Evaluation of Two Allocation Schemes for Combinatorial.
1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.
Power Reduction for FPGA using Multiple Vdd/Vth
High Performance, Pipelined, FPGA-Based Genetic Algorithm Machine A Review Grayden Smith Ganga Floora 1.
06 December 2007 FPGA Self-Repair using an Organic Embedded System Architecture Kening Zhang, Jaafar Alghazo and Ronald F. DeMara University of Central.
Reconfiguration Based Fault-Tolerant Systems Design - Survey of Approaches Jan Balach, Jan Balach, Ondřej Novák FIT, CTU in Prague MEMICS 2010.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.
Heng Tan Ronald Demara A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management.
Lecture 18: Dynamic Reconfiguration II November 12, 2004 ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II.
An Iterative Heuristic for State Justification in Sequential Automatic Test Pattern Generation Aiman H. El-MalehSadiq M. SaitSyed Z. Shazli Department.
Ronald F. DeMara, Carthik A. Sharma University of Central Florida Self-Checking Fault Detection Self-Checking Fault Detection using Discrepancy Mirrors.
J. Christiansen, CERN - EP/MIC
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
Programmable Logic Devices
ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTEMS
Fault-Tolerant Systems Design Part 1.
European Test Symposium, May 28, 2008 Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI Kundan.
“Politehnica” University of Timisoara Course No. 2: Static and Dynamic Configurable Systems (paper by Sanchez, Sipper, Haenni, Beuchat, Stauffer, Uribe)
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
CprE 458/558: Real-Time Systems
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
FTC (DS) - V - TT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 5 FAULT RECOVERY AND TOLERANCE TECHNIQUES (SYSTEM.
Fault-Tolerant Systems Design Part 1.
12-14 September 2005 Consensus-based Evaluation for Fault Isolation and On-line Evolutionary Regeneration K. Zhang, R. F. DeMara, and C. A. Sharma University.
Section 1  Quickly identify faulty components  Design new, efficient testing methodologies to offset the complexity of FPGA testing as compared to.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T.
Test Case Designing UNIT - 2. Topics Test Requirement Analysis (example) Test Case Designing (sample discussion) Test Data Preparation (example) Test.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
1 July 2005 Autonomous FPGA Fault Handling Competitive Runtime Reconfiguration Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration.
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
ECE 171 Digital Circuits Chapter 9 Hazards Herbert G. Mayer, PSU Status 2/21/2016 Copied with Permission from prof. Mark PSU ECE.
Best detection scheme achieves 100% hit detection with
ELEC692 VLSI Signal Processing Architecture Lecture 12 Numerical Strength Reduction.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
1 Introduction to Engineering Fall 2006 Lecture 17: Digital Tools 1.
Introduction to Programmable Logic
FPGA Implementation of Multicore AES 128/192/256
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTES
Overview: Fault Diagnosis
Information Redundancy Fault Tolerant Computing
Mi Zhou, Li-Hong Shang Yu Hu, Jing Zhang
Aiman H. El-Maleh Sadiq M. Sait Syed Z. Shazli
Hardware Assisted Fault Tolerance Using Reconfigurable Logic
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Test Data Compression for Scan-Based Testing
Presentation transcript:

Ronald F. DeMara, Carthik A. Sharma University of Central Florida A Combinatorial Group Testing Method A Combinatorial Group Testing Method for FPGA Fault Location

Introduction Field Programmable Gate Arrays  Gate-array-based reconfigurable architecture Matrix of Logic Cells (Look-Up Tables) surrounded by peripheral I/O cells Matrix of Logic Cells (Look-Up Tables) surrounded by peripheral I/O cells  Capabilities: Runtime reconfiguration Runtime reconfiguration On-chip processor core & Millions of gate-equivalent logic elements On-chip processor core & Millions of gate-equivalent logic elements  Millions of FPGA devices produced annually: most SRAM-based  Used in mission-critical applications Remote systems & Hazardous Environments Space Applications – Satellites, probes, and shuttles

Group Testing Algorithms Origin – World War II Blood testingOrigin – World War II Blood testing Problem: Test samples from millions of new recruits Solution: Test blocks of sample before testing individual samples Problem DefinitionProblem Definition  Identify subset Q of defectives from set P Minimize number of tests Test v-subsets of P Form suitable blocks

Previous Work Pre-compiled Column-Based Dual FPGA architecture [Mitra04]  Autonomous detection, repair by shifting pre-compiled columns  Isolation using distributed CED-checkers and “blind” reconfiguration attempts Overview of Combinatorial Group Testing and Applications [Du00]  Provides taxonomy and general algorithms for applying CGT  Examples of CGT applications: DNA clone library filtering, vaccine screening, computer fault diagnosis, etc. CGT Enhanced Circuit Diagnosis [Kahng04]  Present doubling, halving etc for circuit fault diagnosis using BIST, CGT  Requires ability to test resources individually Chinese Remainder Sieve technique [Eppstein05]  Efficient non-adaptive and two-stage CGT based on prime number driven test formation  Improved algorithms for practical problem sizes (n < ) with small number of defectives (d < 4)

Device Failure Duration: Target: Detection: Isolation: Diagnosis: Recovery: Transient: SEU Permanent: SEL, Oxide Breakdown, Electron Migration, LPD Repetitive Readback Device Configuration Approach: TMR BIST Processing Datapath Device Configuration Processing Datapath Bitwise Comparison Invert Bit Value Ignore Discrepancy Majority Vote STARS Supplementary Testbench Cartesian Intersection Worst-case Clock Period Dilation Replicate in Spare Resource Characteristics Methods CED Duplex Output Comparison Fast Run-time Location Select Spare Resource Duplex Output Comparison unnecessary Repetitive Intersections Evolutionary Algorithm using Intrinsic Fitness Evaluation Fault-Handling Techniques Dueling CGT-Based

Isolation Problem Outline Objectives  Locate faulty logic and/or interconnect resource: a single stuck-at fault model is assumed  Online Fault Isolation: device not entirely removed from service Features  Runtime Reconfiguration: FPGA resources configured dynamically  Utilize Runtime Inputs: avoid special test-vectors, improve availability Constraints  Use pre-designed configurations: defined by target application  Subsets under test have constant resource utilization range for a given isolation problem  Resource grouping influences fault articulation: resource-mapping and input vector might mask hardware faults  Do not use specialized “block designs”  Runtime reconfiguration limited to column-swapping  “Non-reasonable” algorithm: “tests” may be repeated without gaining new isolation information

Fault Location Using Dueling The set of all competing configurations is represented by S. Set C k represents the resources utilized by configuration k. Each competing configuration k, 1 < k < |S| has a unique binary Usage Matrix Usage Matrix U k, 1 < k < p. Elements U k [i,j], 1 < i < m, 1 < j n, where m and n represent the rows and columns in the device layout respectively. Elements U k [i,j] = 1 denote the usage of resource (i, j) by C k. History Matrix The History Matrix H, with elements H[i,j] 1 < i < m, 1 < j < n, is an integer matrix used to represent the relative fitness of individual resources. H[i,j] provides instantaneous relative fitness values of resources.

Dueling Example H t = 0 H t = 2 U1U1U1U1 U2U2U2U2 H [i,j] changes after C 1 and C 2 are loaded H [i,j] changes after C 1 and C 2 are loaded U 1 and U 2 are corresponding Usage Matrices U 1 and U 2 are corresponding Usage Matrices (3,3) is identified as the faulty resource (3,3) is identified as the faulty resource

Modified Halving Initially all H[i,j] = 0 Selection Process can be Adaptive Fitness Augmentation can be non-linear Columns can be swapped with any other Columns

FPGA Arrangement for Dueling Configurations in Population C = C L  C R C L = subset of left-half configurations C R = subset of right-half configurations |C L |=|C R |= |C|/2

Isolation Progress without Halving Without Halving Initially |S| = 20,000 Resource Utilization = 40% Number of suspected faulty elements constant at 36 after 23 iterations No subsequent improvement due to lack of differentiating information between competing configurations Temporary stasis in isolation due to insufficient design diversity

Dueling with Modified Halving Dueling with Halving Halving works by swapping half the used columns with unused ones Halving progressively reduces the size of the set of suspected faulty elements Isolation proceeds till a single faulty element is isolated Fault isolated after 19 iterations Symptoms of stasis invoke halving procedure for fast isolation

Effect of Total Number of Elements Increased Problem Size Number of Elements = (Number of Rows x Number of Columns As the size of the array containing the fault increases, the increase in the required number of iterations is minimal For 1 mill. elements, only 27.4 iterations required.

Effect of Population Size Population Size Single fault in S is assumed As pop. size increases, isolation expected to be fasterAs pop. size increases, isolation expected to be faster Increased pop. size implies more initial designs A population size of 30 seems to be an ideal tradeoff between ease of isolation, and the difficulty of generating increased number of individuals. Increased population size provides minimal added benefit

Effect of Resource Utilization Moderate resource utilization ideal for isolation Rate of isolation progress low with extreme utilization characteristics Rate of isolation progress low with extreme utilization characteristics Isolation takes longer when less than 20% or greater than 80% of the available resources are utilized. Isolation takes longer when less than 20% or greater than 80% of the available resources are utilized

Future Work Conducting Tests using Benchmark CircuitsConducting Tests using Benchmark Circuits  ISCAS89 s38584 with gates: sequential logic  ISCAS85 circuits with max 3513 gates: combinational logic  Compression/ Signal Processing algorithms, such as the Lempel-Ziv (LZ) compression scheme [Mitra04] Development of an architecture to enable column-swappingDevelopment of an architecture to enable column-swapping  Multi-layer Runtime Reconfigurable Architecture (MRRA) being prototyped

Backup Slides On following pages …

Online Dueling Evaluation ObjectiveObjective  Isolate faults by successive intersection between sets of FPGA resources used by configurations  Analyze complexity of Isolation process VariablesVariables  Total resources available Measured in number of LUTs  Number of Competing Configurations Number of initial “Seed” designs in CRR process  Degree of Articulation Some inputs may not manifest faults, even if faulty resource used by individual  Resource Utilization Factor Percentage of FPGA resources required by target application/design  Number of Iterations for Isolation Measure of complexity and time involved in isolating fault

Discrepancy Mirror Circuit Fault Coverage ComponentFault ScenariosFault-Free Function Output AFaultCorrect Function Output BCorrectFaultCorrect XNOR A Disagree (0) Fault : Disagree(0)Agree (1) XNOR B Disagree (0) Agree (1)Fault : Disagree(0)Agree (1) Buffer A 00High-Z01 Buffer B 000High-Z1 Match Output00001

Influence of LUT utilization Perpetually Articulating Inputs with Equiprobable Distribution Intermittently Articulating Inputs with Equiprobable Distribution expected number of pairings grows sub-linearly in number of resources utilization below 20% or above 80% implicates (or exonerates) a smaller sub-set of resources 50% utilization, the expected number of pairings for 1,000, 10,000, and 100,000 resources are 11.1, 14.9, and 17.6 at 90% utilization mean value of 258 pairings are required to isolate the faulty resource.

Accommodating Multi-bit Word Widths Proof of conceptProof of concept  The present circuit works efficiently  Demonstrates important Dueling-enabled isolation method StrategiesStrategies  Use an array of detectors attempt to minimize points of failure as word-width increases Number of logic resources used is acceptable for smaller circuits  Create new circuit or scheme, combining fault tolerant coding-based methods with single-fault secure circuit  Current research focused on improving detector by investigating codes, and fault-secure circuits

Pull-down Resistor Considerations Proof of conceptProof of concept  The present circuit works in a verifiable correct manner  Can utilize synthesized (digital) pull-down resistor which simulate the behavior of analog resistors  Demonstrates Dueling-enabled isolation method  Can be utilized without implementation problems for Custom-VLSI designs Alternative ApproachAlternative Approach  Alternate detector circuits for FPGA implementation are under investigation  Avoid using Tri-state buffers, pull-down resistors and use native digital components available on FPGAs

graceful degredation via ranking of alternatives Evolutionary Computation strategies effective for more than just repair phase: continually detect, rank, and isolate faults entirely within the underlying data throughput flow Competitive Runtime Reconfiguration (CRR) no test vectors diverse alternatives working a-priori fault detection by robust consensus over time device remains online during repair no reconfiguration when fault-free fault isolation is model-free and self-calibrating completely- repaired criteria can be ignored performance readily adjustable novel fitness assessment via pairwise discrepancy without any pre-conceived oracle for correctness (emergent behavior) ConceptualInnovation checking logic part of individual hence also competes for correctness failures in population memory covered

States Transitions during lifetime of i th Half-Configuration Configuration Health States Discrepancy Operator Baseline Discrepancy Operator  is dyadic operator with binary output: Z(C i ) is FPGA data throughput output of configuration C i  = RS: (Hamming Distance)  = WTA: (Equivalence)

Procedural Flow under Consensus-Based Evaluation Initialization Partition P into sub-populations of size |P|/2 to designate physical FPGA left-half or right-half resource utilization Consensus Based Evaluation Discrepancy Operator: CL  CR Four Fitness States :  Pristine Suspect Under Repair RefurbishedRegeneration Genetic Operators recover based on Reintroduction Rate Operators only applied once then offspring returned to “service” without concern about increasing fitness

GA Parameters & Experiments Speciation  Two-point crossover between individuals from same sub-group  Crossover points chosen to prevent intra-CLB crossover  Breeding occurs exclusively among members of sub-populations  Maintains non-interfering resource use among L, R GA operators External-Module-CrossoverInternal-Module-CrossoverInternal-Module-Mutation GA parameters Population size : 20 individuals Crossover rate : 5% Mutation rate : up to 80% per bit  Fault Isolation Characteristics  Regenerative Experiments Demonstrate …  Objective fitness function replaced by the Consensus-based Evaluation Approach and Relative Fitness  Elimination of additional test vectors Experiments …

Impact of Fault on Viable Individuals Existence of Positive Test VectorExistence of Positive Test Vector  Input I p comprises a positive test vector iff C v (I p )  C f (I p ) = 1 where C v denotes a viable configuration and C f denotes a faulty configuration  So if a discrepancy is visible then some I p exists which manifests the fault Minimal Case whenis UniqueMinimal Case when I p is Unique  I p is unique if fault is observable under exactly one test vector Probability Mass Function for Encounteringin Minimal CaseProbability Mass Function for Encountering I p in Minimal Case  Consider E w =600 yielding 99.5% coverage for a module with input space W=64  The number of input occurrences, 0  i  600, that randomly encounter I p to identify the fault is governed by the probability density function: p.m.f.(i)= where where D is the length of E w

Isolation of a single faulty individual with 1-out-of-64 impact Outliers are identified after E W iterations have elapsed Expected D.V. = (1/64)*600 = from individual impacted by fault 3 Isolated individual’s DV differs from the average DV by 3  after 1 or more observation intervals of length E W

Isolation of a single faulty L individual with 10-out-of-64 impact Compare with 1-out-of-64 fault impact  Expected DV of (10/64)*600 = for faulty configuration  One isolation will be complete approx. once in every 93.75/5 = 19 Sliding Windows  Fault Isolation achieved is 100%

Isolation of 8 faulty individuals L4&R4 with 1-out-of-64 impact Expected isolations do not occur approx. 40% of the timeExpected isolations do not occur approx. 40% of the time  Average discrepancy value of the population is higher  Outlier isolation difficult  Multiple faulty individual, Discrepancies scattered

Regeneration Performance Difference (vs. Hamming Distance) Evaluation Window, E w = 600 Suspect Threshold: DV S = 1-6/600=99% Repair Threshold: DV R = 1-4/600 = 99.3% Re-introduction rate: r = 0.1 Parameters Parameters : Repairs evolved in-situ, in real-time, without additional test vectors, while allowing device to remain partially online.

Multilayer Runtime Reconfiguration Architecture (MRRA) Develop MRRA fast reconfiguration paradigm for the CRR approachDevelop MRRA fast reconfiguration paradigm for the CRR approach Validate with real hardware platform along with detailed performance analysisValidate with real hardware platform along with detailed performance analysis First general-purpose framework for a wide variety of applications requiring dynamic reconfigurationFirst general-purpose framework for a wide variety of applications requiring dynamic reconfiguration Extend existing theories on reconfigurationExtend existing theories on reconfiguration

Loosely Coupled Solution The entire system operates on a 32-bit basis The Virtex-II Pro is mounted on a development board which can then be interfaced with a WorkStation running Xilinx EDK and ISE.

For further info … EH Website