Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Optimizing Dynamic Logic Realizations for Partial Reconfiguration of Field Programmable Gate Arrays Matthew G. Parris University of Central Florida Matthew G. Parris University of Central Florida
Agenda Contributions of Thesis Previous Work Evolvable Hardware Optimization Strategies Partial Reconfiguration & Architectural Analysis Dynamic Processor Allocation Strategies Conclusion and Future Work
Contributions of Thesis Novel Taxonomy Classify current FPGA fault-handling methods FPGA Repair Optimization Improve the performance of a Genetic Algorithm Architectural Analysis Demonstrate benefits of newer FPGA devices Adaptive Architecture Implementation Exploit benefits of Partial Reconfiguration
Previous Work SRAM Field Programmable Gate Arrays (FPGA) From: The Design Warrior’s Guide to FPGAs by Clive Maxfield LUT mux flip-flop a b c d in clock q y Programmable Logic Block (PLB)
Previous Work Unlimited Programmability Quickly test prototypes on final H/W architecture Patch design flaws while in use Repair radiation faults Ideal target for space applications
Previous Work Manufacturer-provided Increase production yield of FPGAs Architectural / hardware modifications User Provided Integrate fault-handling methods into FPGA application
Previous Work A-priori Allocation Assign spare resources during design process Dynamic Processes Assign spare resources or determine repair during run-time
Previous Work Fine-grained Medium-grained Coarse-grained Sub-PLB Spares PLB Spares Incremental Rerouting GA Repair Augmented GA Repair TMR w/ Single Module Repair Online BIST Competing Configurations Resources Operational Delay Fault Latency Unavailability Fault Occlusion Repair Granularity Fault Tolerance Fault Coverage Critical Requirements Metrics Methods
Previous Work Genetic Algorithm Fault-Handling Some other method detects a fault Create a population of candidate solutions Test each candidate to evaluate performance Apply genetic operators to create new individuals Crossover Mutation Repeat process until complete repair is found ++
Evolvable Hardware Optimization Strategies Optimize GA fault-handling method Some partition methods are based on similarity between individuals Requires similarity function that may not be possible, and also incurs undesired computation Age-layered Population Structure (ALPS) Used to evolve higher-fit antenna designs Partition population of candidate solutions based on age of individual Negligible additional computation Contains best individual within one sub-population to prevent convergence of the population
Evolvable Hardware Optimization Strategies Optimize GA fault-handling method Standard GA population age-level 9 age-level 8 age-level 7 age-level 6 age-level 5 age-level 4 age-level 3 age-level 2 age-level 1 age-level 0 Repair
Evolvable Hardware Optimization Strategies Individuals increasing in age
Evolvable Hardware Optimization Strategies Evolution of competitive individuals
Evolvable Hardware Optimization Strategies Best Individuals at each Generation (averaged over 100 runs)
Evolvable Hardware Optimization Strategies Reasons for sluggish performance Partitioning the population into sub-populations (restricts rate that genetic info is communicated) Replacing the bottom age-level every 20 gen. (causes ALPS to be less deterministic) Beginning population size of ALPS is 1/10 of standard (700 generations are needed to saturate capacity)
Parent 1 2 Choice 1 2 Evolvable Hardware Optimization Strategies Propose new selection strategy for crossover genetic operator Old Selection Strategy (combined) New Selection Strategy (separate) Parent 1 Pop 1 Pops 0&1 Parent 2 Pop 0 Pop 1 Choose with probability p
Evolvable Hardware Optimization Strategies Best Individuals at each Generation (averaged over 100 runs)
Evolvable Hardware Optimization Strategies
Partial Reconfiguration and Architectural Analysis Overview Partial reconfiguration modifies a portion of the FPGA Multiple modules may reside within reconfigurable area
Previous Work Spare Configs: Fine-grained
Previous Work Online Recovery: Competitive Configurations
Partial Reconfiguration and Architectural Analysis Benefits of Partial Reconfiguration Reconfiguration: time-multiplex between functions (extend the number of available resources with time) Partial: module granularity reduced Unchanged portion of FPGA is not affected by configuration Smaller bitstream filesize Smaller reconfiguration time Less storage requirements Result: significantly more combinations of hardware arrangements with similar storage requirements
Partial Reconfiguration and Architectural Analysis xc2vp30-7ff896, 80CLB configuration frame Bitstream Filesize (bytes) Area Allocated (slices) Area Used (slices) Time to Configure (seconds) Full Device1,448,81713,696 7 MD5320,597 (22.1%) 1280 (9.3%)389 (2.8%)2 (28.6%) SHA-1356,702 (24.6%) 1280 (9.3%)457 (3.3%)2 (28.6%) 2.8 –3.3% resource usage versus 22.1 –24.6% bitstream filesize
Partial Reconfiguration and Architectural Analysis Overview of partial reconfiguration design
Partial Reconfiguration and Architectural Analysis FPGA Implementation and Resource Utilization
Partial Reconfiguration and Architectural Analysis xc4vfx60-11ff672, 16CLB configuration frame Bitstream Filesize (bytes) Area Allocated (slices) Area Used (slices) Full Device2,625,43825,280 MD595,962 (3.7%)1,280 (5.1%)405 (1.6%) SHA-197,619 (3.7%)1,280 (5.1%)472 (1.9%) 1.6 –1.9% resource usage versus 3.7% bitstream filesize V-II: 320,597 bytes versus V-4: 95,962 bytes (70% reduction)
Dynamic Processor Allocation Strategies Increase Reconfigurable Areas from 1 to 8 Implement Adaptable Architecture for Video Processing Functions Discrete Cosine Transform (DCT) Motion Estimation Video functions are sufficiently different in resources to require reconfiguration
Dynamic Processor Allocation Strategies Location of 8 PEs on a V4SX device
Dynamic Processor Allocation Strategies Slices within Area (Slice Utilization) Bitstream Filesize in bytes PE0320 (94.38%)22,306 PE1384 (95.05%)27,794 PE2384 (84.38%)28,306 PE3384 (92.97%)28,158 PE4320 (91.25%)22,306 PE5384 (88.54%)27,354 PE6384 (87.76%)27,618 PE7384 (95.57%)27,654
Dynamic Processor Allocation Strategies Bitstream Filesize Configuration Time Non-PR 1x1 Full 2D-DCT1,712,614 bytes17 ms 4x4 DCT & 4 ME PEs1,712,614 bytes17 ms 8x8 Full 2D-DCT1,712,614 bytes17 ms 3 H/W Arrangements4.90 MB 17ms/17ms (Best/Worst) PR Initial (8x8 )1,712,614 bytes17 ms 8 Full Precision PEs8 × 28,306 bytes8 × ms 8 Partial Precision PEs8 × 28,306 bytes8 × ms 8 Empty PEs8 × 10,586 bytes8 × ms 16 H/W Arrangements2.15 MB 0.106/2.265 ms (Best/Worst) PR Initial (8x8 )1,712,614 bytes17 ms 8 Full Precision PEs8 × 28,306 bytes8 × ms 8 Partial Precision PEs8 × 28,306 bytes8 × ms 8 Empty PEs8 × 10,586 bytes8 × ms 8 Motion Estimation PEs8 × 28,306 bytes8 × ms 80 H/W Arrangements2.36 MB 0.106/2.265 ms (Best/Worst)
Dynamic Processor Allocation Strategy Benefits of Partial Reconfiguration Reconfiguration: time-multiplex between functions (extend the number of available resources with time) Partial: module granularity reduced Unchanged portion of FPGA is not affected by configuration Smaller bitstream filesize Smaller reconfiguration time Less storage requirements Result: significantly more combinations of hardware arrangements with similar storage requirements
Conclusion and Future Work Evolvable Hardware Non-deterministic methods can repair faulty digital circuits Time required justified by ability to exploit faults Increase complete repair occurrence rate 5-fold Future Improvements make use of fault location optimize genetic algorithm parameters
Conclusion and Future Work Partial Reconfiguration Newer partial reconfiguration flow allows rectangle-sized areas Allows static resources to maximize FPGA area Newer architecture allows: multiple rectangle-sized areas within one column of resources reduced configuration granularity for modules 30% reduction in storage and configuration time
Conclusion and Future Work Dynamic Processors Utilizes newer software design flow and newer FPGA hardware architecture Storage reduced 55-fold Time reduced 8–160 fold Benefits make reconfiguration possible for fast processes such as video functions Time multiplexing may enable smaller FPGA devices to compete with larger devices not utilizing partial reconfiguration
Conclusion and Future Work Future Work Develop self-contained partial reconfiguration solution Continue to challenge and improve reconfiguration process and hardware design enable FPGAs to be standard hardware platform for evolvable/adaptable systems
Publication HUANG, J., PARRIS, M., LEE, J. and DEMARA, R.F Scalable FPGA Architecture for DCT Computation using Dynamic Partial Reconfiguration. accepted to International Conference on Engineering of Reconfigurable Systems and Algorithms.
Previous Work Spare Resources: Sub-PLB Spares
Previous Work Offline Recovery: Incremental Rerouting
Previous Work Online Recovery: Online BIST
Evolvable Hardware Optimization Strategies