Download presentation
Presentation is loading. Please wait.
Published byCassandra Robyn Fox Modified over 9 years ago
1
12-14 September 2005 Consensus-based Evaluation for Fault Isolation and On-line Evolutionary Regeneration K. Zhang, R. F. DeMara, and C. A. Sharma University of Central Florida
2
Technical Objective: Autonomous FPGA Regeneration Redundancy increases with amount of spare capacity restricted at design-time based on time required to select spare resource determined by adequacy of spares available (?) yes Regeneration weakly-related to number recovery capacity variable at recovery-time based on time required to find suitable recovery affected by multiple characteristics (+ or -) yes Overhead from Unutilized Spares weight, size, power Granularity of Fault Coverage resolution where fault handled Fault-Resolution Latency availability via downtime required to handle fault Quality of Repair likelihood and completeness Autonomous Operation recover without outside intervention Increased availability without pre-configured spares … everyday examplespare tirecan of fix-a-flat NASA Moon, Mars, and Beyond: Realize 10’s years service life ??? Stardust: 110 FPGAs …
3
Fault Recovery Characteristics of Selected Approaches Previous Work on Fault Recovery Normalized Power Consumption (Energy per Operation): n-plex solution using n redundant devices Reconfiguration cost r Gate-Level redundancy g Updated with scan rate s on c CLBs
4
Exploiting Population Information Population contains more robust information than individualsPopulation contains more robust information than individuals Utilize this information for robust fault detection, faster regeneration, increased diversity for adaptation Detect Failure and Isolate Faulty ResourcesDetect Failure and Isolate Faulty Resources Detect by inconsistencies among the population Isolate faults using outlier identification and aging Realize RegenerationRealize Regeneration Recovery Complexity << Design Complexity utilize diverse raw material during regeneration vs. isolated re-design utilize diverse raw material during regeneration vs. isolated re-design Temporal consensus directs search Adaptable Performance based on Online InputsAdaptable Performance based on Online Inputs The population evolves to changing physical environment, input vectors, and target application while increasing availability
5
Procedural Flow under Consensus-Based Evaluation Initialization Partition P into sub-populations of size |P|/2 to designate physical FPGA left-half or right-half resource utilization Consensus Based Evaluation Discrepancy Operator: CL CR Four Fitness States : Pristine Suspect Under Repair RefurbishedRegeneration Genetic Operators recover based on Reintroduction Rate Operators only applied once then offspring returned to “service” without concern about increasing fitness
6
Consensus-Based Evaluation (CBE) Overview Uses a Relative Fitness MeasureUses a Relative Fitness Measure Pairwise discrepancy checking yields relative fitness measure Broad temporal consensus in the population used to determine fitness metric Transition between Fitness States occurs in the population Provides graceful degradation in presence of changing environments, applications and inputs, since this is a moving measure Test Inputs = Normal Inputs for Data ThroughputTest Inputs = Normal Inputs for Data Throughput CBE does not utilizes additional functional nor resource test vectors Potential for higher availability as regeneration is integrated with normal operation
7
States Transitions during lifetime of i th Half-Configuration Configuration Health States Discrepancy Operator Baseline Discrepancy Operator is dyadic operator with binary output: Z(C i ) is FPGA data throughput output of configuration C i = RS: (Hamming Distance) = WTA: (Equivalence)
8
Selection and Repair Process Maintain Availability Choose Pristine, Suspect, Refurbished individuals in that order Enable Regeneration Choose Under-Repair individuals subject to Re-introduction rate ( R )
9
Fitness State Adjustment / Repair
10
Individual’s Fitness: Evaluation Window Number of Selections with Replacement Probability of Selection Containing all K items Each individual subjected to sufficient random operational inputs for accurately assessment For combinational logic, E W is determined on the basis of input word width Genetic operators invoked once every E W iterations on Under-Repair individuals to avoid unnecessary modifications EW = 600 Random run-time inputs provide a 99.5% certainty of the test being exhaustive and conclusive
11
Population Comparison: Fitness Indices Population Consensus Sliding Window Population behavior is periodically sampled to determine current oracle value for global fitness metric Thresholds need to be current but not updated more frequently than necessary Updating thresholds occurs after 25% of individuals completed E W Ensures a fast-moving relative measure for adaptability Case study: |C|=20 individuals … |C L |=|C R |= |C|/2 E WSliding Window = 5 E W 5/20 = 25% individuals evaluated == “sufficient”
12
Integer Multiplier Case Study Automated Creation of a Population of Multipliers: –Building blocks Half-Adder: 18 templates created Full-Adder: 24 templates Parallel-And : 1 template created –OR, AND, XOR, NOR, NAND and NOT functions can be assigned to a LUT –Randomly select templates for instantiation in modules –Strict Feed-Forward flow enforced –XOR function excluded from initial designs to increase design space –Average of 21 CLBs utilized for a 3bit x 3bit Multiplier –Configurations divided into two groups, each subset using exclusive resources
13
GA Parameters & Experiments Speciation Two-point crossover between individuals from same sub-group Crossover points chosen to prevent intra-CLB crossover Breeding occurs exclusively among members of sub-populations Maintains non-interfering resource use among L, R GA operators External-Module-CrossoverInternal-Module-CrossoverInternal-Module-Mutation GA parameters Population size : 20 individuals Crossover rate : 5% Mutation rate : up to 80% per bit Fault Isolation Characteristics Regenerative Experiments Demonstrate … Objective fitness function replaced by the Consensus-based Evaluation Approach and Relative Fitness Elimination of additional test vectors Experiments …
14
Isolation of a single faulty individual with 1-out-of-64 impact Outliers are identified after E W iterations have elapsed Expected D.V. = (1/64)*600 = 9.375 from individual impacted by fault 3 Isolated faulty individual’s DV differs from the average DV by 3 after 1 or more observation intervals of length E W instantaneous DV (point values) for a sample individual in population and population oracles (solid lines) Sliding Window
15
Isolation of a single faulty L individual with 10-out-of-64 impact Compare with 1-out-of-64 fault impact Expected DV of (10/64)*600 = 93.75 for faulty configuration One isolation will be complete approx. once in every 93.75/5 = 19 Sliding Windows Fault Isolation achieved is 100%
16
Isolation of 8 faulty individuals L4&R4 with 1-out-of-64 impact Expected isolations do not occur approx. 40% of the timeExpected isolations do not occur approx. 40% of the time Average discrepancy value of the population is higher Outlier isolation difficult Multiple faulty individual, Discrepancies scattered
17
Regeneration Performance Difference (vs. Hamming Distance) Evaluation Window, E w = 600 Suspect Threshold: DV S = 1-6/600=99% Repair Threshold: DV R = 1-4/600 = 99.3% Re-introduction rate: r = 0.1 Parameters Parameters : Repairs evolved in-situ, in real-time, without additional test vectors, while allowing device to remain partially online.
18
Conclusion Repair ComplexityRepair Complexity should be more tractable that Design Complexity, given diverse “spare” designs Population-Centric AssessmentPopulation-Centric Assessment Provides adaptability and self-calibrating autonomy with a relative assessment method Run-time Fault ManagementRun-time Fault Management Can be realized using consensus-driven assessment methods, and using information contained in the population Integrate Detection, Isolation, Repair under a single Population-based technique
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.