Rashad S. Oreifej, Carthik A. Sharma, and Ronald F. DeMara University of Central Florida Expediting GA-Based Evolution Using Group Testing Techniques for Reconfigurable Hardware 1 ReConFig’06 San Luis Potosi - Mexico 1. Research support in-part by NSF grant CRCD:
Evolvable Hardware Evolutionary Design: Start with available CLBs and IOBs Implement a design using Genetic Operators etc [Fogarty97] Limited or no ability to re-design to account for suspected faulty resources Evolutionary Regeneration: Evolutionary Regeneration: Start with an existing pool of designs Some existing configurations may use faulty resources Eliminate use of suspected faulty resources Genetic Operators can be applied to refurbish designs [Vigander01]
Previous Work Pre-compiled Column-Based Dual FPGA architecture [Mitra04]Pre-compiled Column-Based Dual FPGA architecture [Mitra04] Autonomous detection, repair by shifting pre-compiled columns Isolation using distributed CED-checkers and “blind” reconfiguration attempts Overview of Combinatorial Group Testing and Applications [Du00]Overview of Combinatorial Group Testing and Applications [Du00] Provides taxonomy and general algorithms for applying CGT Examples of CGT applications: DNA clone library filtering, vaccine screening, computer fault diagnosis, etc. CGT Enhanced Circuit Diagnosis [Kahng04]CGT Enhanced Circuit Diagnosis [Kahng04] Present doubling, halving etc for circuit fault diagnosis using BIST, CGT Requires ability to test resources individually Chinese Remainder Sieve technique [Eppstein05]Chinese Remainder Sieve technique [Eppstein05] Efficient non-adaptive and two-stage CGT based on prime number driven test formation Improved algorithms for practical problem sizes (n < ) with small number of defectives (d < 4)
Genetic Algorithms & Evolvable Hardware GAs are strong candidates for implementing system refurbishment: They implement guided trial-and-error search using principles of Darwinian evolution Iterative selection enforces “survival of the fittest” Genetic operators - mutation, crossover, … - can be used to refurbish designs Hypothesis Hypothesis: Information regarding resource performance can expedite GA-based refurbishmentIndividual(Chromosome)GENE GAs frequently use strings of 1s and 0s to represent candidate solutions FPGA Configuration File is a String of 1s and 0s
Conventional vs. CGT-Pruned GA Conventional GA:Conventional GA: Searches the whole space to evolve a working design or repair Information about resource suitability may accelerate search CGT-Pruned GA: CGT-Pruned GA: Prefers resources of higher fitness to evolve a working design or repair. Q. How to obtain resource fitness information? A. Using Group Testing Techniques. Combinatorial Group Testing identifies a decreasing group of “defectives” by iterative refinement Tests on subsets of suspects Is expected to take less time. “ Faster Design and Faster Repair ”
CGT-Pruned GA Simulator
Experimental Setup Target Circuit 3-bit x 2-bit Multiplier No. of Experiments 120 (60/Experiment Type Repair and Design) FPGA Architecture Feed-Forward design No. of Resources 60 LUTs (15 CLB, 4LUTs/CLB) Fault Model Logic Single Fault Model Fault Type Stuck at One
CGT-Pruned Refurbishment Isolate and Avoid suspect resources from being usedIsolate and Avoid suspect resources from being used HypothesisHypothesis : CGT-Pruned GA Repair evolves a full fitness circuit faster than Conventional GA Repair Results show performance improvement in CGT-Pruned Repair
Results: Conventional Vs. CGT-Pruned Repair CGT-Pruned GA out-performs Conventional GA Experiment TypeConventional RepairCGT-pruned Repair Circuit3-bit x 2-bit Multiplier Number of Experiments30 Arithmetic Mean (Generations) Standard Deviation Standard Error of the Mean % Confidence Interval[14300 → 20000][8400 → 13000]
Achieving Refurbishment with Cell Swapping Isolate and Swap suspect resourcesIsolate and Swap suspect resources Cell Swapping OperatorCell Swapping Operator Copy suspect resource “Cell” configuration to another unused cell GA searches for routing strategy to re-route interconnect to the previously-unused cell Refurbishment with Cell SwappingRefurbishment with Cell Swapping Swap suspect cells one by one and evaluate fitness until full fitness is evolved If swapping all suspect cells does not realize complete refurbishment, then employ other GA operators
Repair Progress
CGT-Pruned GA Design Evolve the entire circuit design from scratchEvolve the entire circuit design from scratch Avoid suspect resources and take advantage of resource redundancy within the FPGAAvoid suspect resources and take advantage of resource redundancy within the FPGA CGT-Pruning outperforms Conventional GA-based techniques
Results: Conventional Vs. CGT-Pruned Design Design of a circuit in the presence of a single stuck-at fault Experiment TypeConventional designCGT-pruned design Circuit3-bit x 2-bit Multiplier Number of Experiments30 Arithmetic Mean (Generations) Standard Deviation Standard Error of the Mean % Confidence Interval[57300 → 71700][46450 → 61350]
Comparison of Performance – Number of Generations for Repair 70% More than 70% of the experiments benefited substantially from resource information generated using CGT
Results Summary CGT-Pruned GAs: As opposed to Conventional GAs, CGT-Pruned GAs: 38% Completely refurbish configurations in 38% fewer generations 16% Design fully functional configurations in 16% fewer generations Faulty resources are eliminated from Pool of unused-resources in the case of repair as opposed to the pool of all-resources in the case of design. Repair complexity vs. Design complexity Repair complexity << Design complexity one-fifth Repairs were realized in one-fifth of the time required for Design
Backup Slides On following pages …
Motivation Mission-critical Embedded Systems require high reliability and availabilityMission-critical Embedded Systems require high reliability and availability Characteristics of Operating Environment may induce hardware failures:Characteristics of Operating Environment may induce hardware failures: Aging, Manufacturing Defects, …etc. System Reliability:System Reliability: Fault Avoidance. “Always Possible?”… No Design Margin. “Always Adequate?”… No Modular Redundancy. “Always Recoverable?”…No Fault Refurbishment. “Highly Flexible?” … Yes … but technically challenging to achieve
Group Testing Techniques Competitive Group TestingCompetitive Group Testing Algorithm based on group testing methods Use competition between configurations Temporal information stored in H matrix Successive intersection Monitor health history of resources which presents resource fitness Simulated using C programming language and GSL functions [Sharma- 06] Relative fitness of resource α 1/H [i,j] H [i,j] i,j
Three Fast Runs of the CGT- pruned GA Repair GA evolves to a relatively very high fitness within the first few hundreds generations, but takes significantly more generations to reach the maximum fitness
References [1]Fogarty T. C., J. F. Miller, and P. Thomson, "Evolving Digital Logic Circuits on Xilinx 6000 Family FPGAs," in Proceedings of The 2nd Online Conference on Soft Computing, June [2]Sverre Vigander, “Evolutionary Fault Repair in Space Applications”, Master’s Thesis, Dept. of Computer & Information Science, Norwegian University of Science and Technology (NTNU), Trondheim, [3]C. A. Sharma, R. F. DeMara, "A Combinatorial Group Testing Method for FPGA Fault Location", accepted to International Conference on Advances in Computer Science and Technology (ACST 2006), Puerto Vallarta, Mexico, January , 2006 [4]S. Mitra and E. J. McCluskey, “Which Concurrent Error Detection Scheme to Choose?,” in Proceedings of the International Test Conference 2000, p. 985, October [5]D. Du and F. K. Hwang. Combinatorial Group Testing and its Applications, volume 12 of Series on Applied Mathematics. World Scientific, [6]A. B. Kahng and S. Reda. “Combinatorial Group Testing Methods for the BIST Diagnosis Problem,” in Proceedings of the Asia and South Pacific Design Automation Conference, January [7]Keymeulen, D.; Zebulum, R.S.; Jin, Y.; Stoica, A.. “Fault-Tolerant Evolvable Hardware Using Field-Programmable Transistor Arrays”, IEEE Transactions On Reliability, Vol. 49, No. 3, September 2000 [8]Lohn, J.; Larchev, G.; DeMara, R. “Evolutionary fault recovery in a Virtex FPGA using a representation that incorporates routing”, Parallel and Distributed Processing Symposium, Proceedings. International April 2003 [9]Lach, J.; Mangione-Smith, W.H.; Potkonjak, M. “Low overhead fault-tolerant FPGA systems”, Very Large Scale Integration (VLSI) Systems, IEEE Transactions on Volume 6, Issue 2, June 1998 [10]Miron Abramovici, John M. Emmert and Charles E. Stroud, “Roving Stars: An Integrated Approach To On-Line Testing, Diagnosis, And Fault Tolerance For Fpgas In Adaptive Computing Systems”, The Third NASA/DoD Workshop on Evolvable Hardware, Long Beach, Cailfornia 2001 [11]DeMara, R.F.; Kening Zhang. “Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration”, Evolvable Hardware, Proceedings NASA/DoD Conference on June 2005 [12]D. Eppstein, M. T. Goodrich, and D. S. Hirschberg. “Improved combinatorial group testing for realworld problem sizes”, In Workshop on Algorithms and Data Structures (WADS), Lecture Notes Comput. Sci. Springer, [13]J. F. Miller, P. Thomson, and T. Fogarty. “Designing Electronic Circuits Using Evolutionary Algorithms. Arithmetic Circuits: A Case Study”, In D. Quagliarella, J. Periaux, C. Poloni, and G. Winter, editors, Genetic Algorithms and Evolution Strategy in Engineering and Computer Science, pages Morgan Kaufmann, Chichester, England, 1998.
Fault Tolerant Design and Detection Characteristics ***Incorporates resource performance information Previous Work
Fault Recovery Characteristics Previous Work
Our Goal: Autonomous FPGA Refurbishment Redundancy increases with amount of spare capacity restricted at design-time based on time required to select spare resource determined by adequacy of spares available (?) yes Refurbishment weakly-related to number recovery capacity variable at recovery-time based on time required to find suitable recovery affected by multiple characteristics (+ or -) yes Overhead from Unutilized Spares weight, size, power Granularity of Fault Coverage resolution where fault handled Fault-Resolution Latency availability via downtime required to handle fault Quality of Repair likelihood and completeness Autonomous Operation fix without outside intervention increase availability without carrying pre-configured spares … everyday examplespare tirescan of fix-a-flat
Commercial Applications: Nextel: frequency allocation for cellular phone networks -- $15M predicted savings in NY market Pratt & Whitney: turbine engine design --- engineer: 8 weeks; GA: 2 days w/3x improvement International Truck: production scheduling improved by 90% in 5 plants NASA: superior Jupiter trajectory optimization, antennas, FPGAs Koza: 25 instances showing human-competitive performance such as analog circuit design, amplifiers, filters GA Success Stories
Adaptive GA Design Circuit:2 to 4 Decoder CLBs:2 LUTs/CLB:4 Fault:Stuck at 1 and Stuck at 0 Traditional GA:220 Generations *, std dev 240** Adaptive GA:152 Generations *, std dev 120** * Arithmetic mean for twenty experiments ** Standard Deviation for twenty experiments
Analysis Metrics Mean: Standard Deviation: Standard Error of the Mean: Confidence Level:
CGT-Pruned GA Simulator C++ based console applicationC++ based console application Consists of:Consists of: Combinatorial Group Testing component Uses Gnu Scientific Library (GSL) Genetic Algorithm component Object oriented architecture that models FPGA resources Modes of Operation:Modes of Operation: CGT-Pruned GA Repair Use CGT to isolate suspect resources Avoid use of suspect-faulty resource in design refurbishment process CGT-Pruned GA Repair with Cell Swapping Swap suspect-faulty resources with previously unused resources to evolve a recovery CGT-Pruned GA Design Evolve a new working design while avoiding suspect resources