Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Optimizing Dynamic.

Slides:



Advertisements
Similar presentations
Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
ECE-777 System Level Design and Automation Hardware/Software Co-design
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
1 Reconfigurable Hardware Thomas Polzer Overview Definition Definition Methods Methods Devices Devices Applications Applications Problems Problems.
HTR: On-Chip Hardware Task Relocation for Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
Hardware Implementation of Antenna Beamforming using Genetic Algorithm Kevin Hsiue Bryan Teague.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Build-In Self-Test of FPGA Interconnect Delay Faults Laboratory for Reliable Computing (LaRC) Electrical Engineering Department National Tsing Hua University.
FPGA Defect Tolerance: Impact of Granularity Anthony YuGuy Lemieux December 14, 2005.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Bitstream Relocation with Local Clock Domains for Partially Reconfigurable FPGAs Adam Flynn, Ann Gordon-Ross, Alan D. George NSF Center for High-Performance.
Adaptive Video Coding to Reduce Energy on General Purpose Processors Daniel Grobe Sachs, Sarita Adve, Douglas L. Jones University of Illinois at Urbana-Champaign.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Benefits of Partial Reconfiguration Reducing the size of the FPGA device required to implement a given function, with consequent reductions in cost and.
03/12/20101 Analysis of FPGA based Kalman Filter Architectures Arvind Sudarsanam Dissertation Defense 12 March 2010.
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
Rawad N. Al-Haddad, Carthik A. Sharma, Ronald F. DeMara University of Central Florida Performance Evaluation of Two Allocation Schemes for Combinatorial.
Power Reduction for FPGA using Multiple Vdd/Vth
Matthew Ziegler CS 851 – Bio-Inspired Computing Evolvable Hardware and the Embryonics Approach.
Coarse and Fine Grain Programmable Overlay Architectures for FPGAs
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
(TPDS) A Scalable and Modular Architecture for High-Performance Packet Classification Authors: Thilan Ganegedara, Weirong Jiang, and Viktor K. Prasanna.
Reconfiguration Based Fault-Tolerant Systems Design - Survey of Approaches Jan Balach, Jan Balach, Ondřej Novák FIT, CTU in Prague MEMICS 2010.
Heng Tan Ronald Demara A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management.
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.
J. Christiansen, CERN - EP/MIC
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Design Framework for Partial Run-Time FPGA Reconfiguration Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research.
Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.
“Politehnica” University of Timisoara Course No. 2: Static and Dynamic Configurable Systems (paper by Sanchez, Sipper, Haenni, Beuchat, Stauffer, Uribe)
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
Partial Region and Bitstream Cost Models for Hardware Multitasking on Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance.
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
Rinoy Pazhekattu. Introduction  Most IPs today are designed using component-based design  Each component is its own IP that can be switched out for.
A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead Heng Tan and Ronald F. DeMara University of Central Florida.
IMPACT OF CACHE PARTITIONING ON MULTI-TASKING REAL TIME EMBEDDED SYSTEMS Presentation by: Eric Magil Research by: Bach D. Bui, Marco Caccamo, Lui Sha,
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
Jason Li Jeremy Fowers 1. Speedups and Energy Reductions From Mapping DSP Applications on an Embedded Reconfigurable System Michalis D. Galanis, Gregory.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon.
Authors: Soamsiri Chantaraskul, Klaus Moessner Source: IET Commun., Vol.4, No.5, 2010, pp Presenter: Ya-Ping Hu Date: 2011/12/23 Implementation.
© PSU Variation Aware Placement in FPGAs Suresh Srinivasan and Vijaykrishnan Narayanan Pennsylvania State University, University Park.
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Reconfigurable Computing1 Reconfigurable Computing Part II.
Runtime Temporal Partitioning Assembly to Reduce FPGA Reconfiguration Time Abelardo Jara-Berrocal, Ann Gordon-Ross HCS Research Laboratory College of Engineering.
Dynamo: A Runtime Codesign Environment
Prabhat Kumar Saraswat Paul Pop Jan Madsen
FPGA: Real needs and limits
Jian Huang, Matthew Parris, Jooheung Lee, and Ronald F. DeMara
Mi Zhou, Li-Hong Shang Yu Hu, Jing Zhang
Department of Electrical Engineering Joint work with Jiong Luo
Dynamic Partial Reconfiguration of FPGA
Presentation transcript:

Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Optimizing Dynamic Logic Realizations for Partial Reconfiguration of Field Programmable Gate Arrays Matthew G. Parris University of Central Florida Matthew G. Parris University of Central Florida

Agenda Contributions of Thesis Previous Work Evolvable Hardware Optimization Strategies Partial Reconfiguration & Architectural Analysis Dynamic Processor Allocation Strategies Conclusion and Future Work

Contributions of Thesis Novel Taxonomy  Classify current FPGA fault-handling methods FPGA Repair Optimization  Improve the performance of a Genetic Algorithm Architectural Analysis  Demonstrate benefits of newer FPGA devices Adaptive Architecture Implementation  Exploit benefits of Partial Reconfiguration

Previous Work SRAM Field Programmable Gate Arrays (FPGA) From: The Design Warrior’s Guide to FPGAs by Clive Maxfield LUT mux flip-flop a b c d in clock q y Programmable Logic Block (PLB)

Previous Work Unlimited Programmability  Quickly test prototypes on final H/W architecture  Patch design flaws while in use  Repair radiation faults Ideal target for space applications

Previous Work Manufacturer-provided  Increase production yield of FPGAs  Architectural / hardware modifications User Provided  Integrate fault-handling methods into FPGA application

Previous Work A-priori Allocation  Assign spare resources during design process Dynamic Processes  Assign spare resources or determine repair during run-time

Previous Work Fine-grained Medium-grained Coarse-grained Sub-PLB Spares PLB Spares Incremental Rerouting GA Repair Augmented GA Repair TMR w/ Single Module Repair Online BIST Competing Configurations Resources Operational Delay Fault Latency Unavailability Fault Occlusion Repair Granularity Fault Tolerance Fault Coverage Critical Requirements Metrics Methods

Previous Work Genetic Algorithm Fault-Handling  Some other method detects a fault  Create a population of candidate solutions  Test each candidate to evaluate performance  Apply genetic operators to create new individuals Crossover Mutation  Repeat process until complete repair is found ++

Evolvable Hardware Optimization Strategies Optimize GA fault-handling method  Some partition methods are based on similarity between individuals Requires similarity function that may not be possible, and also incurs undesired computation  Age-layered Population Structure (ALPS) Used to evolve higher-fit antenna designs Partition population of candidate solutions based on age of individual Negligible additional computation Contains best individual within one sub-population to prevent convergence of the population

Evolvable Hardware Optimization Strategies Optimize GA fault-handling method Standard GA population age-level 9 age-level 8 age-level 7 age-level 6 age-level 5 age-level 4 age-level 3 age-level 2 age-level 1 age-level 0 Repair

Evolvable Hardware Optimization Strategies Individuals increasing in age

Evolvable Hardware Optimization Strategies Evolution of competitive individuals

Evolvable Hardware Optimization Strategies Best Individuals at each Generation (averaged over 100 runs)

Evolvable Hardware Optimization Strategies Reasons for sluggish performance  Partitioning the population into sub-populations (restricts rate that genetic info is communicated)  Replacing the bottom age-level every 20 gen. (causes ALPS to be less deterministic)  Beginning population size of ALPS is 1/10 of standard (700 generations are needed to saturate capacity)

Parent 1 2 Choice 1 2 Evolvable Hardware Optimization Strategies Propose new selection strategy for crossover genetic operator Old Selection Strategy (combined) New Selection Strategy (separate) Parent 1 Pop 1 Pops 0&1 Parent 2 Pop 0 Pop 1 Choose with probability p

Evolvable Hardware Optimization Strategies Best Individuals at each Generation (averaged over 100 runs)

Evolvable Hardware Optimization Strategies

Partial Reconfiguration and Architectural Analysis Overview  Partial reconfiguration modifies a portion of the FPGA  Multiple modules may reside within reconfigurable area

Previous Work Spare Configs: Fine-grained

Previous Work Online Recovery: Competitive Configurations

Partial Reconfiguration and Architectural Analysis Benefits of Partial Reconfiguration  Reconfiguration: time-multiplex between functions (extend the number of available resources with time)  Partial: module granularity reduced Unchanged portion of FPGA is not affected by configuration Smaller bitstream filesize Smaller reconfiguration time Less storage requirements  Result: significantly more combinations of hardware arrangements with similar storage requirements

Partial Reconfiguration and Architectural Analysis xc2vp30-7ff896, 80CLB configuration frame Bitstream Filesize (bytes) Area Allocated (slices) Area Used (slices) Time to Configure (seconds) Full Device1,448,81713,696 7 MD5320,597 (22.1%) 1280 (9.3%)389 (2.8%)2 (28.6%) SHA-1356,702 (24.6%) 1280 (9.3%)457 (3.3%)2 (28.6%) 2.8 –3.3% resource usage versus 22.1 –24.6% bitstream filesize

Partial Reconfiguration and Architectural Analysis Overview of partial reconfiguration design

Partial Reconfiguration and Architectural Analysis FPGA Implementation and Resource Utilization

Partial Reconfiguration and Architectural Analysis xc4vfx60-11ff672, 16CLB configuration frame Bitstream Filesize (bytes) Area Allocated (slices) Area Used (slices) Full Device2,625,43825,280 MD595,962 (3.7%)1,280 (5.1%)405 (1.6%) SHA-197,619 (3.7%)1,280 (5.1%)472 (1.9%) 1.6 –1.9% resource usage versus 3.7% bitstream filesize V-II: 320,597 bytes versus V-4: 95,962 bytes (70% reduction)

Dynamic Processor Allocation Strategies Increase Reconfigurable Areas from 1 to 8 Implement Adaptable Architecture for Video Processing Functions  Discrete Cosine Transform (DCT)  Motion Estimation Video functions are sufficiently different in resources to require reconfiguration

Dynamic Processor Allocation Strategies Location of 8 PEs on a V4SX device

Dynamic Processor Allocation Strategies Slices within Area (Slice Utilization) Bitstream Filesize in bytes PE0320 (94.38%)22,306 PE1384 (95.05%)27,794 PE2384 (84.38%)28,306 PE3384 (92.97%)28,158 PE4320 (91.25%)22,306 PE5384 (88.54%)27,354 PE6384 (87.76%)27,618 PE7384 (95.57%)27,654

Dynamic Processor Allocation Strategies Bitstream Filesize Configuration Time Non-PR 1x1 Full 2D-DCT1,712,614 bytes17 ms 4x4 DCT & 4 ME PEs1,712,614 bytes17 ms 8x8 Full 2D-DCT1,712,614 bytes17 ms 3 H/W Arrangements4.90 MB 17ms/17ms (Best/Worst) PR Initial (8x8 )1,712,614 bytes17 ms 8 Full Precision PEs8 × 28,306 bytes8 × ms 8 Partial Precision PEs8 × 28,306 bytes8 × ms 8 Empty PEs8 × 10,586 bytes8 × ms 16 H/W Arrangements2.15 MB 0.106/2.265 ms (Best/Worst) PR Initial (8x8 )1,712,614 bytes17 ms 8 Full Precision PEs8 × 28,306 bytes8 × ms 8 Partial Precision PEs8 × 28,306 bytes8 × ms 8 Empty PEs8 × 10,586 bytes8 × ms 8 Motion Estimation PEs8 × 28,306 bytes8 × ms 80 H/W Arrangements2.36 MB 0.106/2.265 ms (Best/Worst)

Dynamic Processor Allocation Strategy Benefits of Partial Reconfiguration  Reconfiguration: time-multiplex between functions (extend the number of available resources with time)  Partial: module granularity reduced Unchanged portion of FPGA is not affected by configuration Smaller bitstream filesize Smaller reconfiguration time Less storage requirements  Result: significantly more combinations of hardware arrangements with similar storage requirements

Conclusion and Future Work Evolvable Hardware  Non-deterministic methods can repair faulty digital circuits  Time required justified by ability to exploit faults  Increase complete repair occurrence rate 5-fold  Future Improvements make use of fault location optimize genetic algorithm parameters

Conclusion and Future Work Partial Reconfiguration  Newer partial reconfiguration flow allows rectangle-sized areas Allows static resources to maximize FPGA area  Newer architecture allows: multiple rectangle-sized areas within one column of resources reduced configuration granularity for modules 30% reduction in storage and configuration time

Conclusion and Future Work Dynamic Processors  Utilizes newer software design flow and newer FPGA hardware architecture Storage reduced 55-fold Time reduced 8–160 fold  Benefits make reconfiguration possible for fast processes such as video functions  Time multiplexing may enable smaller FPGA devices to compete with larger devices not utilizing partial reconfiguration

Conclusion and Future Work Future Work  Develop self-contained partial reconfiguration solution  Continue to challenge and improve reconfiguration process and hardware design enable FPGAs to be standard hardware platform for evolvable/adaptable systems

Publication HUANG, J., PARRIS, M., LEE, J. and DEMARA, R.F Scalable FPGA Architecture for DCT Computation using Dynamic Partial Reconfiguration. accepted to International Conference on Engineering of Reconfigurable Systems and Algorithms.

Previous Work Spare Resources: Sub-PLB Spares

Previous Work Offline Recovery: Incremental Rerouting

Previous Work Online Recovery: Online BIST

Evolvable Hardware Optimization Strategies