1 July 2005 Autonomous FPGA Fault Handling Competitive Runtime Reconfiguration Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration.

Slides:



Advertisements
Similar presentations
Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
Advertisements

Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Optimizing Dynamic.
On the Genetic Evolution of a Perfect Tic-Tac-Toe Strategy
7 July 2008 Sustainable Fault-Handling of Reconfigurable Logic using Throughput-Driven Assessment Carthik Anand Sharma University of Central Florida.
Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
7/2/2015Intelligent Systems and Soft Computing1 Lecture 9 Evolutionary Computation: Genetic algorithms Introduction, or can evolution be intelligent? Introduction,
Intro to AI Genetic Algorithm Ruth Bergman Fall 2004.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Genetic Algorithm.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Evolutionary Intelligence
Rawad N. Al-Haddad, Carthik A. Sharma, Ronald F. DeMara University of Central Florida Performance Evaluation of Two Allocation Schemes for Combinatorial.
Page 19/17/2015 CSE 30341: Operating Systems Principles Optimal Algorithm  Replace page that will not be used for longest period of time  Used for measuring.
1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
High Performance, Pipelined, FPGA-Based Genetic Algorithm Machine A Review Grayden Smith Ganga Floora 1.
Intro. ANN & Fuzzy Systems Lecture 36 GENETIC ALGORITHM (1)
Genetic algorithms Prof Kang Li
06 December 2007 FPGA Self-Repair using an Organic Embedded System Architecture Kening Zhang, Jaafar Alghazo and Ronald F. DeMara University of Central.
Reconfiguration Based Fault-Tolerant Systems Design - Survey of Approaches Jan Balach, Jan Balach, Ondřej Novák FIT, CTU in Prague MEMICS 2010.
CS 484 – Artificial Intelligence1 Announcements Lab 3 due Tuesday, November 6 Homework 6 due Tuesday, November 6 Lab 4 due Thursday, November 8 Current.
1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.
Heng Tan Ronald Demara A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management.
Researchers: Preet Bola Mike Earnest Kevin Varela-O’Hara Han Zou Advisor: Walter Rusin Data Storage Networks.
An Iterative Heuristic for State Justification in Sequential Automatic Test Pattern Generation Aiman H. El-MalehSadiq M. SaitSyed Z. Shazli Department.
Ronald F. DeMara, Carthik A. Sharma University of Central Florida Self-Checking Fault Detection Self-Checking Fault Detection using Discrepancy Mirrors.
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.
J. Christiansen, CERN - EP/MIC
Computational Complexity Jang, HaYoung BioIntelligence Lab.
Evolving Virtual Creatures & Evolving 3D Morphology and Behavior by Competition Papers by Karl Sims Presented by Sarah Waziruddin.
Genetic Algorithms Introduction Advanced. Simple Genetic Algorithms: Introduction What is it? In a Nutshell References The Pseudo Code Illustrations Applications.
29 September 2005 Dynamic Voting Schemes to Enhance Evolutionary Repair in Reconfigurable Logic Devices C. Milliord, C. A. Sharma, and R. F. DeMara University.
“Politehnica” University of Timisoara Course No. 2: Static and Dynamic Configurable Systems (paper by Sanchez, Sipper, Haenni, Beuchat, Stauffer, Uribe)
© Negnevitsky, Pearson Education, Lecture 9 Evolutionary Computation: Genetic algorithms Introduction, or can evolution be intelligent? Introduction,
Exact and heuristics algorithms
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
 Negnevitsky, Pearson Education, Lecture 9 Evolutionary Computation: Genetic algorithms n Introduction, or can evolution be intelligent? n Simulation.
C OMPARING T HREE H EURISTIC S EARCH M ETHODS FOR F UNCTIONAL P ARTITIONING IN H ARDWARE -S OFTWARE C ODESIGN Theerayod Wiangtong, Peter Y. K. Cheung and.
Author : Ioannis Sourdis, Vasilis Dimopoulos, Dionisios Pnevmatikatos and Stamatis Vassiliadis Publisher : ANCS’06 Presenter : Zong-Lin Sie Date : 2011/01/05.
1 Genetic Algorithms K.Ganesh Introduction GAs and Simulated Annealing The Biology of Genetics The Logic of Genetic Programmes Demo Summary.
A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead Heng Tan and Ronald F. DeMara University of Central Florida.
12-14 September 2005 Consensus-based Evaluation for Fault Isolation and On-line Evolutionary Regeneration K. Zhang, R. F. DeMara, and C. A. Sharma University.
Edge Assembly Crossover
1. Genetic Algorithms: An Overview  Objectives - Studying basic principle of GA - Understanding applications in prisoner’s dilemma & sorting network.
Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T.
Ronald F. DeMara, Carthik A. Sharma University of Central Florida A Combinatorial Group Testing Method A Combinatorial Group Testing Method for FPGA Fault.
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
1 Comparative Study of two Genetic Algorithms Based Task Allocation Models in Distributed Computing System Oğuzhan TAŞ 2005.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
EVOLUTIONARY SYSTEMS AND GENETIC ALGORITHMS NAME: AKSHITKUMAR PATEL STUDENT ID: GRAD POSITION PAPER.
Reconfigurable Computing1 Reconfigurable Computing Part II.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Chapter 12 Case Studies Part B. Control System Design.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Intelligent Exploration for Genetic Algorithms Using Self-Organizing.
FPGA: Real needs and limits
Overview: Fault Diagnosis
Mi Zhou, Li-Hong Shang Yu Hu, Jing Zhang
Aiman H. El-Maleh Sadiq M. Sait Syed Z. Shazli
Boltzmann Machine (BM) (§6.4)
Searching for solutions: Genetic Algorithms
Presentation transcript:

1 July 2005 Autonomous FPGA Fault Handling Competitive Runtime Reconfiguration Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration Ronald F. DeMara and Kening Zhang University of Central Florida

Reprogrammable Device Failure Duration: Target: Detection: Isolation: Diagnosis: Recovery: Transient: SEU Permanent: SEL, Oxide Breakdown, Electron Migration, LPD Repetitive Readback [Wells00] Device Configuration Approach: TMR (conventional spatial redundancy) BIST Processing Datapath Device Configuration Processing Datapath Evolutionary Bitwise Comparison Invert Bit Value Ignore Discrepancy Majority Vote STARS [Abramovici01] Supplementary Testbench Cartesian Intersection Worst-case Clock Period Dilation Replicate in Spare Resource Characteristics Methods CED [McCluskey04] Duplex Output Comparison Fast Run-time Location Select Spare Resource Sussex [Vigander01] Duplex Output Comparison (not addressed) unnecessary Population-based GA using Extrinsic Fitness Evaluation Evolutionary Algorithm using Intrinsic Fitness Evaluation Fault-Handling Techniques for SRAM-based FPGAs CRR [DeMara05]

Previous Work Detection Characteristics of FPGA Fault-Handling Schemes Strategies Strategies : 1) Evolve redundancy into design before anticipated failure 2) Redesign after detection of failure 3) Combine desirable aspects of both strategies 1) + 2) …

CRR Arrangement in SRAM FPGA Configurations in Population C = C L  C R C L = subset of left-half configurations C R = subset of right-half configurations |C L |=|C R |= |C|/2 Discrepancy Operator Baseline Discrepancy Operator  is dyadic operator with binary output: Z(C i ) is FPGA data throughput output of configuration C i Each half-configuration evaluates  using embedded checker (XNOR gate) within each individual Any fault in checker lowers that individual’s fitness so that individual is no longer preferred and eventually undergoes repair  = RS: (Hamming Distance)  = WTA: (Equivalence)

Terminology and Characteristics Pristine Pool: Pristine Pool: C P. For any C i  C, is member of C P at generation G if and only if Suspect Pool: Suspect Pool: C S. For any C i  C, is member of C S at generation G if and only if at least one of Under Repair Pool: Under Repair Pool: C U : For any C i  C, is member of C U at generation G if and only if Refurbished Pool: Refurbished Pool: C R : after Genetic Operator applied, the new generated individual is member of C R at generation G if and only if Discrepancy CountCorrectness Count E D is Discrepancy Count of C i and E C is Correctness Count of C i Length of Evaluation Fitness Window: Length of Evaluation Fitness Window: W = E D + E C Fitness Metric: Fitness Metric: f(C i ) =E C / E W

1.Initialization  Population P of functionally-identical yet physically-distinct configurations  Partition P into sub-populations that use supersets of physically-distinct resources, e.g. size |P|/2 to designate physical FPGA left-half or right-half resource utilization 2.Fitness Assessment  Discrepancy Operator  is some function of bitwise agreement between each half’s output  Four Fitness States defined for Configurations as {C P,C S,C U,C R } with transitions, respectively: Pristine Suspect Under Repair Refurbished  Fitness Evaluation Window W determines comparison interval 3.Regeneration  Genetic Operators used to recover from fault based on Reintroduction Rate  Operators only applied once then offspring returned to “service” without for concern about increasing fitness Sketch of CRR Approach Premise: Recovery Complexity << Design Complexity fitness assessment via pairwise discrepancy (temporal voting vs. pairwise discrepancy (temporal voting vs. spatial voting)

States Transitions during lifetime of i th Half-Configuration Configuration Health States

Procedural Flow under Competitive Runtime Reconfiguration Integrates all fault handling stages using EC strategy  Detects faults by the occurrence of discrepancy  Isolates faults by accumulation of discrepancies  Failure-specific refurbishment using Genetic Operators: Intra-Module-Crossover, Inter-Module-Crossover, Intra-Module-Mutation Realize online device refurbishment  Refurbished online without additional function or resource test vectors  Repair during the normal data throughput process

Selection Process

Fitness Adjustment Procedure

Fitness Evaluation Window Fitness Evaluation WindowFitness Evaluation Window : W  denotes number of iterations used to evaluate fitness before the state of an individual is determined Determination offor 3x3 multiplier Determination of W for 3x3 multiplier  6 input pins articulating 2 6 =64 possible inputs  W should be selected so that all possible inputs appear  More formally, Let rand (X) return some x i  X at random Seek W  : [  rand (X) ] = X with high probability i=1 W x K = distinct orderings of K inputs showing in D trials if D constant, can calculate P k>1 successively probability P K of K inputs showing after D trials is ratio of x K / K D

When K=64: W Determination

Impact of Fault on Viable Individuals Existence of Positive Test VectorExistence of Positive Test Vector  Input I p comprises a articulating test iff C i (I p )  C j  i (I p ) = 1  So if a discrepancy is detected then some I p exists which manifests the fault Minimal Case whenis UniqueMinimal Case when I p is Unique  I p is unique if fault is observable under exactly one input pattern Probability Mass Function for Encountering Minimal CaseProbability Mass Function for Encountering Minimal Case I p  Consider W=600 yielding 99.5% coverage for a module with input space X=64  The number of input occurrences, 0  i  600, that randomly encounter I p to identify the fault is governed by the probability density function: p.m.f. = where

Integer Multiplier Case Study 3bit x 3bit unsigned multiplieresign:3bit x 3bit unsigned multiplier automated design: –Building blocks  Half-Adder: 18 templates created  Full-Adder: 24 templates  Parallel-And : 1 template created –Randomly select templates for instantiation in modules GA operators External-Module-Crossover Internal-Module-Crossover Internal-Module-Mutation GA parameters Population size : 20 individuals Crossover rate : 5% Mutation rate : up to 80% per bit Experimental Evaluation Xilinx Virtex II Pro on Avnet PCI board Objective fitness function replaced by the Consensus-based Evaluation Approach and Relative FitnessObjective fitness function replaced by the Consensus-based Evaluation Approach and Relative Fitness Elimination of additional test vectorsElimination of additional test vectors Temporal Assessment processTemporal Assessment process Experiments Demonstrate …

Template Fault Coverage Half-Adder Template A Half-Adder Template B Template A – Gate3 is an AND gate – Will lose correctness if a Stuck-At-Zero fault occurs in second input line of the Gate3, an AND gate Template B – Gate3 is a NOT gate and only uses the first input line – Will work correctly even if second input line is stuck at Zero or One Half-Adder Template A

Regeneration Performance Difference (vs. Hamming Distance) Evaluation Window, E w = 600 Suspect Threshold:  S = 1-6/600=99% Repair Threshold:  R = 1-4/600 = 99.3% Re-introduction rate: r = 0.1 Parameters Parameters : Repairs evolved in-situ, in real-time, without additional test vectors, while allowing device to remain partially online.

Discrepancy Mirror Fault Coverage Mechanism for Checking-the-Checker (“golden element” problem) Makes checker part of configuration that competes for correctness [DeMara PDPTA-05]

Discrepancy Mirror Circuit Fault Coverage ComponentFault ScenariosFault-Free Function Output AFaultCorrect Function Output BCorrectFaultCorrect XNOR A Disagree (0) Fault : Disagree(0)Agree (1) XNOR B Disagree (0) Agree (1)Fault : Disagree(0)Agree (1) Buffer A 00High-Z01 Buffer B 000High-Z1 Match Output00001

Influence of LUT utilization Perpetually Articulating Inputs with Equiprobable Distribution Intermittently Articulating Inputs with Equiprobable Distribution expected number of pairings grows sub-linearly in number of resources utilization below 20% or above 80% implicates (or exonerates) a smaller sub-set of resources 50% utilization, the expected number of pairings for 1,000, 10,000, and 100,000 resources are 11.1, 14.9, and 17.6 at 90% utilization mean value of 258 pairings are required to isolate the faulty resource.

Future Work: Development Board to Self-Contained FPGA Qualitative Analysis of CRR model Number of iterations and completeness of regeneration repair Percentage of time the device remains online despite physical resource fault (availability) Hardware Resource Management Optimization of hardware profile for Xilinx Virtex II Pro Field Testing on SRAM-based FPGA in a Cubesat mission

Backup Slides On following pages …

Isolation: Block Duelling Algorithm based on group testingmethodsAlgorithm based on group testing methods Successive intersection to assess health of resourcesSuccessive intersection to assess health of resources kU[i,j] Each configuration k has a binary Usage Matrix U k [i,j] 1  i  m and 1  j  n  m, n are the number of rows and columns of resources in the device  Elements U k [i,j] = 1 are resources used in k H [i,j] History Matrix H [i,j] 1  i  m and 1  j  n, initially all zero, exists in which :  entries represent the fitness of resources (i, j)  Information regarding the fitness of resources over time is stored A discrepant output will lead to an increase in the value of H[i,j],  U k [i,j] = 1,k  S  All elements of H, corresponding to resources used by discrepant configuration will be incremented by one.  At any point in time, H[i,j] will be a record the outcomes of competitions  m successive intersections among are performed until |S|=1

Dueling Example H t = 0 H t = 2 U1U1U1U1 U2U2U2U2 H [i,j] changes after C 1 and C 2 are loaded H [i,j] changes after C 1 and C 2 are loaded U 1 and U 2 are corresponding Usage Matrices U 1 and U 2 are corresponding Usage Matrices (3,3) is identified as the faulty resource (3,3) is identified as the faulty resource Fitness of configuration k kk k

Isolation of a single faulty individual with 1-out-of-64 impact Outliers are identified after W iterations elapsed E.V. = (1/64)*600 = from minimum impact faulty individual 3 Isolated individual’s f differs from the average DV by 3  after 1 or more observation intervals of length W

Isolation of a single faulty L individual with 10-out-of-64 impact Compare with 1-out-of-64 fault impact  E.V. of (10/64)*600 = discrepancies for faulty configuration  One isolation will be complete approx. once in every 93.75/5 = 19 Observation Intervals  Fault Isolation demonstrated in 100% of case

Isolation of 8 faulty individuals L4&R4 with 1-out-of-64 impact Expected isolations do not occur approximately 40% of the time  Average discrepancy value of the population is higher  Outlier isolation difficult  Multiple faulty individual, Discrepancies scattered

Online Dueling Evaluation ObjectiveObjective  Isolate faults by successive intersection between sets of FPGA resources used by configurations  Analyze complexity of Isolation process VariablesVariables  Total resources available Measured in number of LUTs  Number of Competing Configurations Number of initial “Seed” designs in CRR process  Degree of Articulation Some inputs may not manifest faults, even if faulty resource used by individual  Resource Utilization Factor Percentage of FPGA resources required by target application/design  Number of Iterations for Isolation Measure of complexity and time involved in isolating fault

Isolation of Faulty Resource at the FPGA resource (LUT) granularity LUTsXilinx Virtex II Pro FPGA50625 LUTs comparable to LUTs on a Xilinx Virtex II Pro FPGA  Xilinx Virtex II Pro has approximately 67 columns, 78 rows 4 slices per CLB 2 LUTs per slice

Isolation of Faulty Resource: Effect of Articulation No direct, uniform relation between % Articulation and Number of Isolations! 50%  10%Performance best when Articulation (%) = 50%  10%  Each successive intersection provides maximal information  Greatest number of resources are intersected out of “suspect” pool.

For further info … EH Website

Fast Reconfiguration for Autonomously Reprogrammable Logic MotivationMotivation –Dynamic reconfiguration required by application –Exploit architectural & performance improvements fully –Reconfiguration delay – a major performance barrier Previous WorkPrevious Work MethodologyMethodology –Multilayer Runtime Reconfiguration Architecture (MRRA) –Spatial Management Prototype DevelopmentPrototype Development –Loosely-Coupled solution –Timing Analysis –System-On-Chip solution

Reconfiguration Demand during CRR For a complete repair –Approximately 2,000 generations ( ) may be required –For each generation, # evaluations may be up to 100 evaluations –Yielding the Cumulative Number of Reconfigurations (CNR) up to –For each reconfiguration task Even if reconfiguration delay alone is assumed to be in the order of tens or hundreds of milliseconds  L tot >= 5.5 hours – Therefore, the total delay

Previous Work - Tool Level Approach FPGA Supported On-chip System Bit Stream Reuse System Coupling Degree Potential Limitations Moraes, Mesquita, Palma, Moller Virtex XCV300 devices NoNLoose Lack of Area Relocation Capability Raghavan, Sutton Xilinx Virtex devices NoNLoose Cumbersome CAD flow Blodget, McMillan Virtex II devices PartialYMedium Limited hardware speed and capacity. Lack of information for bit stream reuse

Previous Work - Algorithm Level ApproachMethod Partial Reconfig Spatial Relocation Temporal Parallelism Area shape Run- Time Potential Limitations Hauck, Li, Schwabe Bit file compression N/ANoN/A No Full reconfiguratio n required Shirazi, Luk, Cheung Identifying common components YesNoYesN/ANo Design time work required Mak, Young Dynamic Partitioning YesNoYesN/AYes Only desirable for large designs Ganesan, Vemuri PipeliningYesNoYesN/AYes Limited pipeline depth Compton, Li, Knol, Hauck Relocation and Defragmentatio n with new FPGA architecture Yes NoRow-basedYes Special FPGA architecture required Diessel, Middendorf Schmeck, Schmidt Task Remapped and Relocated Yes NoRectangleYes Overhead for remapping calculations Herbert, Christoph, Macro Partitioning and 2D Hashing Yes RectangleYes Rigid task modeling assumptions compression method temporal method spatial method

Multilayer Runtime Reconfiguration Architecture (MRRA) Develop MRRA fast reconfiguration paradigm for the CRR approachDevelop MRRA fast reconfiguration paradigm for the CRR approach Validate with real hardware platform along with detailed performance analysisValidate with real hardware platform along with detailed performance analysis First general-purpose framework for a wide variety of applications requiring dynamic reconfigurationFirst general-purpose framework for a wide variety of applications requiring dynamic reconfiguration Extend existing theories on reconfigurationExtend existing theories on reconfiguration

Loosely Coupled Solution The entire system operates on a 32-bit basis The Virtex-II Pro is mounted on a development board which can then be interfaced with a WorkStation running Xilinx EDK and ISE.

Result Assessment Establish full functional framework of both prototypesEstablish full functional framework of both prototypes Communication overhead, throughput and overall speed-up analysisCommunication overhead, throughput and overall speed-up analysis  Communication overhead for SOC solution is decreased to micro or sub- micro second order Vs. milliseconds order of Loosely Coupled solution  Up to 5-fold speedup is expected compared to the Loosely Coupled solution Translation Complexity AnalysisTranslation Complexity Analysis  The quantity of information that needs to be translated to generate the reconfiguration bitstream  Simplification from file level to bit level is expected Storage Complexity AnalysisStorage Complexity Analysis –The memory space required for the run-time algorithms – Decreased memory requirement is expected due to the translation complexity improvement

Project Milestones HWSchedule: HW Schedule: SW Schedule:

Publications AcceptedManuscripts Accepted Manuscripts 1.R. F. DeMara and K. Zhang, “Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration,” to appear in NASA/DoD Conference on Evolvable Hardware(EH’05), Washington D.C., U.S.A., June 29 – July 1, H. Tan and R. F. DeMara, “ A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management, ” to appear in International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA ’ 05), Las Vegas, Nevada, U.S.A, June 27 – 30, R. F. DeMara and C. A. Sharma, “ Self-Checking Fault Detection using Discrepancy Mirrors, ” to appear in International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA ’ 05), Las Vegas, Nevada, U.S.A, June 27 – 30, SubmittedManuscripts Submitted Manuscripts 1.R. F. DeMara and K. Zhang, “Populational Fault Tolerance Analysis Under CRR Approach,” submitted to International Conference on Evolvable Systems (ICES’05), Barcelona, Sept. 12 – 14, R. F. DeMara and C. A. Sharma, “ FPGA Fault Isolation and Refurbishment using Iterative Pairing, ” submitted to IFIP VLSI-SOC Conference, Perth, W. Australia, October 17 – 19, Manuscripts In-preparation 1.R. F. DeMara and K. Zhang, “Autonomous Fault Occlusion through Competitive Runtime Reconfiguration,” submission planned to IEEE Transactions on Evolutionary Computation. 2.R. F. DeMara and C. A. Sharma, “ Multilayer Dynamic Reconfiguration Supporting Heterogeneous FPGA Resource Management, ” submission planned to IEEE Design and Test of Computers. Field Testing Implementation of CRR on-board SRAM-based FPGA in a Cubesat mission

EHW Environments Evolvable Hardware (EHW) Environments enable experimental methods to research soft computing intelligent search techniques EHW operates by repetitive reprogramming of real-world physical devices using an iterative refinement process: Genetic Algorithm Hardware in the loop or Two modes of Evolvable Hardware Extrinsic Evolution Genetic Algorithm software model Done? Build it device “design-time” refinement Simulation in the loop Intrinsic Evolution device “run-time” refinement new approach to Autonomous Repair of failed devices Stardust Satellite: >100 FPGAs onboard hostile environment: radiation, thermal stress How to achieve reliability to avoid mission failure??? Application

Genetic Algorithms (GAs) Mechanism coarsely modeled after neo-Darwinism (natural selection + genetics) selection of parents population of candidate solutions parents offspring crossover mutation evaluate fitness of individuals replacement start Fitness function Goal reached

Genetic Mechanisms Guided trial-and-error search techniques using principles of Darwinian evolution  iterative selection, “survival of the fittest”  genetic operators -- mutation, crossover, …  implementor must define fitness function GAs frequently use strings of 1s and 0s to represent candidate solutions  if is better than it will have more chance to breed and influence future population GAs “cast a net” over entire solution space to find regions of high fitness Can invoke Elitism Operator (E=1, E=2 …)  guarantees monotonically increasing fitness of best individual over all generations

Commercial Applications: Nextel: frequency allocation for cellular phone networks -- $15M predicted savings in NY market Pratt & Whitney: turbine engine design --- engineer: 8 weeks; GA: 2 days w/3x improvement International Truck: production scheduling improved by 90% in 5 plants NASA: superior Jupiter trajectory optimization, antennas, FPGAs Koza: 25 instances showing human-competitive performance such as analog circuit design, amplifiers, filters GA Success Stories

Representing Candidate Solutions Individual(Chromosome) GENE  Representation of an individual can be using discrete values (binary, integer, or any other system with a discrete set of values)  Example of Binary DNA Encoding:

Genetic Operators t t + 1t + 1 mutation recombination (crossover) reproduction selection

Crossover Operator Population: parents cut offspring