Design Framework for Partial Run-Time FPGA Reconfiguration Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research.

Slides:



Advertisements
Similar presentations
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Advertisements

2009 Midyear Workshop F4-09: Virtual Architecture and Design Automation for Partial Reconfiguration All Hands Meeting November 10th, 2009 Dr. Ann Gordon-Ross.
Computer Architecture (EEL4713, Fall 2013) Partial Reconfiguration Not just a half baked job of reconfiguring Rohit Kumar Research Student University of.
Run-Time FPGA Partial Reconfiguration for Image Processing Applications Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross.
Reconfigurable Computing (EEL4930/5934) Partial Reconfiguration Not just a half baked job of reconfiguring Rohit Kumar Joseph Antoon Research Students.
A self-reconfiguring platform Brandon Blodget,Philip James- Roxby, Eric Keller, Scott McMillan, Prasanna Sundararajan.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
ENG6530 Reconfigurable Computing Systems Dynamic Run Time Reconfiguration Operating System Support & Embedded Systems.
HTR: On-Chip Hardware Task Relocation for Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
QUIZ What does ICAP stand for ? What is its main use ? Why is Partition Pin preferred over Bus Macro? 1.
Fast FPGA Resource Estimation Paul Schumacher & Pradip Jha Xilinx, Inc.
Hardwired networks on chip for FPGAs and their applications
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Embedded Systems: Introduction. Course overview: Syllabus: text, references, grading, etc. Schedule: will be updated regularly; lectures, assignments.
1 Student: Khinich Fanny Instructor: Fiksman Evgeny המעבדה למערכות ספרתיות מהירות High Speed Digital Systems Laboratory הטכניון - מכון טכנולוגי לישראל.
1 Performed by: Lin Ilia Khinich Fanny Instructor: Fiksman Eugene המעבדה למערכות ספרתיות מהירות High Speed Digital Systems Laboratory הטכניון - מכון טכנולוגי.
1 Chapter 9 Design Constraints and Optimization. 2 Overview Constraints are used to influence Synthesizer tool Place-and-route tool The four primary types.
1 FPGA Lab School of Electrical Engineering and Computer Science Ohio University, Athens, OH 45701, U.S.A. An Entropy-based Learning Hardware Organization.
1 Chapter 14 Embedded Processing Cores. 2 Overview RISC: Reduced Instruction Set Computer RISC-based processor: PowerPC, ARM and MIPS The embedded processor.
Virtual Architecture For Partially Reconfigurable Embedded Systems (VAPRES) Architecture for creating partially reconfigurable embedded systems Module.
Bitstream Relocation with Local Clock Domains for Partially Reconfigurable FPGAs Adam Flynn, Ann Gordon-Ross, Alan D. George NSF Center for High-Performance.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
GPGPU platforms GP - General Purpose computation using GPU
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
Benefits of Partial Reconfiguration Reducing the size of the FPGA device required to implement a given function, with consequent reductions in cost and.
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
Embedded Systems Seminar (EEL6935, Spring 2013) Partial Reconfiguration Not just a half baked job of reconfiguring Rohit Kumar Research Student University.
Automated Design of Custom Architecture Tulika Mitra
DAPR: Design Automation for Partially Reconfigurable FPGAs Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross Associate.
Heng Tan Ronald Demara A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management.
J. Christiansen, CERN - EP/MIC
AMIN FARMAHININ-FARAHANI CHARLES TSEN KATHERINE COMPTON FPGA Implementation of a 64-bit BID-Based Decimal Floating Point Adder/Subtractor.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Hardware Support for Trustworthy Systems Ted Huffmire ACACES 2012 Fiuggi, Italy.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Embedded Runtime Reconfigurable Nodes for wireless sensor networks applications Chris Morales Kaz Onishi 1.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
Partial Region and Bitstream Cost Models for Hardware Multitasking on Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance.
A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead Heng Tan and Ronald F. DeMara University of Central Florida.
CHAPTER 8 Developing Hard Macros The topics are: Overview Hard macro design issues Hard macro design process Physical design for hard macros Block integration.
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
Reconfigurable Embedded Processor Peripherals Xilinx Aerospace and Defense Applications Brendan Bridgford Brandon Blodget.
FPGA Partial Reconfiguration Presented by: Abelardo Jara-Berrocal HCS Research Laboratory College of Engineering University of Florida April 10 th, 2009.
M. ALSAFRJALANI D. DZENITIS Runtime PR for Software Radio 2/26/2010 UFL ECE Dept 1 PARTIAL RECONFIGURATION (PR)
VAPRES A Virtual Architecture for Partially Reconfigurable Embedded Systems Presented by Joseph Antoon Abelardo Jara-Berrocal, Ann Gordon-Ross NSF Center.
November 29, 2011 Final Presentation. Team Members Troy Huguet Computer Engineer Post-Route Testing Parker Jacobs Computer Engineer Post-Route Testing.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Optimizing Packet Lookup in Time and Space on FPGA Author: Thilan Ganegedara, Viktor Prasanna Publisher: FPL 2012 Presenter: Chun-Sheng Hsueh Date: 2012/11/28.
Presenter: Yi-Ting Chung Fast and Scalable Hybrid Functional Verification and Debug with Dynamically Reconfigurable Co- simulation.
Runtime Temporal Partitioning Assembly to Reduce FPGA Reconfiguration Time Abelardo Jara-Berrocal, Ann Gordon-Ross HCS Research Laboratory College of Engineering.
An Automated Hardware/Software Co-Design
Presenter: Darshika G. Perera Assistant Professor
Time-borrowing platform in the Xilinx UltraScale+ family of FPGAs and MPSoCs Ilya Ganusov, Benjamin Devlin.
Dynamo: A Runtime Codesign Environment
Evaluating Partial Reconfiguration for Embedded FPGA Applications
Ming Liu, Wolfgang Kuehn, Zhonghai Lu, Axel Jantsch
FPGA: Real needs and limits
Evaluating Partial Reconfiguration for Embedded FPGA Applications
Reconfigurable Computing
The Xilinx Virtex Series FPGA
Jian Huang, Matthew Parris, Jooheung Lee, and Ronald F. DeMara
Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida
University of Florida, Gainesville, Florida, USA
Dynamic Partial Reconfiguration of FPGA
Introduction to Partial Reconfiguration
Presentation transcript:

Design Framework for Partial Run-Time FPGA Reconfiguration Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research Laboratory College of Engineering University of Florida ERSA 2008 Las Vegas, NV July 14–17, 2008

2 Outline Introduction Partial Reconfiguration (PR) Overview Proposed Design Methodologies Framework analysis Conclusions

3 General purpose I/O System controller FPGA Configuration lines Shared memory Battery Module A Module B Module A Module B Module A Module B Module C Introduction – Fully reconfigurable systems Bitstreams storage External I/O Design station Required design 1. Device too small for complex designs Module C Module B Module A Module B Module A Module C Module B Module A Module C 2. Big full bitstreams (long reconfiguration time) Config 1 Config 2 Config 3 Config 1 Request Config 2 Request 3. Complete system operation is halted prior to reconfiguration Does’nt fit Module C Module B disabled enabled disabled

4 Newer Xilinx FPGA families offer partial reconfiguration feature A rectangular region of the FPGA can be reconfigured without affecting the remaining FPGA area  System can continue operating without interruption Introduction – The Virtex 4 PR architecture ) Reconfigurable region 1 Reconfigurable region 2

5 Module A Module C Module B Introduction – A sample PR architecture FPGA Bitstreams storage Battery External I/O Module C 3. Smaller partial bitstreams Module A request 1. System controller does not need to be placed in an external device 2. Access to fast Internal Configuration Access Port (ICAP – 32 bits, 100 MHz) 4. No need to halt complete system when reconfiguring a module 5. Time multiplexing of FPGA resources, load and unload HW modules on demand Base system configuration JTAG Reconfigurable area disabled Controller (Microblaze) ICAP Flash controller Module C Module B enabled Module A enabled disabled Static area Module A Module B

6 Controller (Microblaze) ICAP Flash controller Introduction – Current PR Design Flow Steps  Partition the system into modules  Define static modules and reconfigurable modules  Decide the number of PR regions (PRRs)  Decide PRR sizes, shapes and locations  Map modules to PRRs  Define PRR interfaces, instantiate slice macros for PRR interfaces Optimization problems  Design partitioning  Number of PRRs  PRR sizes, shapes and locations  Mapping PRMs to PRRs  Type and placement of PRR interfaces Module A Module C Module B Static modules Reconfigurable Modules (PRMs) 1 2 FPGA # of PRRs? PRR 1 PRR 2 Static region Static modules Modules: A and B Modules: C Design partitioning Design floorplanning and budgeting

7 Introduction – Early Access PR Design Flow Introduced by Xilinx in FPL’06 Major improvements: Automatic implementation scripts Rectangular regions (not full column reconfiguration) Static nets can cross reconfigurable regions Slice macros replace bus macros Partitioning and floorplanning steps are manually executed  Design guidelines for these steps are not provided (manual) Placement and PRRs constraints PRM Bitstreams Design partitioning Design floorplanning and budgeting Xilinx PR Implementation Flow Full Initial Bistream Reconfigurable design specifications (automatic) Potential for development of automatic CAD tools

8 Introduction – Current PR design tools limitations PR design is a very specialized task Only a physical level of support is provided  Architectural knowledge of the target device is a must  Not very flexible, many design constraints Partitioning and floorplanning steps are manually executed  No performance sensitive design guidelines are provided  No automatic heuristics based design flow is available too Lack of abstraction from low level details discourages designers from using PR  Difficult for many end users In this work, we will propose a taxonomy of PR systems design flows and a efficient methodology for each type.

9 PR Overview – Taxonomy of PR systems design flows PR System Design Flow Multipurpose Special purpose Highly specialized systems design All PRMs that will exist on the system are known at design time Each PRR is independently optimized (size, shape, location, interface) based on the PRMs that will be mapped to it Output is: 1) Floorplan defining a static region and a set of optimized PRRs 2) The set of PRMs that can be placed in each PRR (PRMs to PRRs mapping) Not optimized for a specific application PRMs required by the application are not known when designing the base system Goal is to design a flexible and reusable base design that can be used for several different PR systems Base system designer defines a set of PRRs with fixed shapes, sizes, locations and interfaces Generated floorplan is used as input template for the PRMs implementation

10 Proposed Design Methodology: Special-Purpose Partition the system into several hardware modules Synthesize the hardware modules Use a control flow graph (CFG) and a states table to represent:  Application states and the transitions between them (execution path coverage)  Set of modules required in each application state Let’s see an example

11 Proposed Design Methodology: Special-Purpose 1. A, B are present in all states (static modules) 2. C, F, G and D are reconfigurable modules (PRMs) 3. F and G are mutually exclusive with respect to C (they can not be placed in the same PRR than C) 4. F, G, D and E can be placed in the same PRR 5. C, D and E can be placed in the same PRR S1 S2 S5 S4 S3 STATEMODULES S1A, B, C S2A, B, C, F S3A, B, C, G S4A, B, D S5A, B, E Static Reconfigurable C F G D E Define region partitioning constraints Establishing constraints

12 4 ? 2 1 ? Proposed Design Methodology: Special-Purpose Define the number of PRRs to be used  Optimization variable  Number is computed based on CFG and states table # PRRs = Define a PRMs to PRRs mapping  Optimization problem  Combinatorial design space  Design space is reduced usign design constraints Static Region: PRR 1: PRR 2: A, B C, D, E F, G Possible solution (not necessarily the optimal)

13 Module A Module B Module C Module D Module E Module F Module G And when do we size our PRRs?  Don’t worry, it is our next step Proposed Design Methodology: Special-Purpose Required static region resources (Resources are added) Required PRR 1 Resources (Maximum of each resource type) Required PRR 2 Resources (Maximum of each resource type) Modules profile Slices BRAMs DSP48s

14 Final optimized custom base system floorplan Define the PRR sizes, shapes, locations inside the FPGA fabric  Floorplanning optimization problem  Proper metrics for PRR performance analysis are required  Design guidelines for efficient PRR floorplanning are also a necessity Proposed Design Methodology: Special-Purpose FPGA Static region PRR 1 Resources PRR 2 Resources Reconfigurable region with enough resources for PRR1 PRR1 PRR2 We do the same for PRR2 Define PRR interfaces  Place slice macros

15 Proposed Design Methodology: Special-Purpose Methodology outputs Custom base system PRMs to PRRs mapping They are used as input files for the automatic Xilinx PR Design Flow

16 Proposed Design Methodology: Special-Purpose Opportunity to automate this flow through design tools Optimization variables  Number of PRRs  PRRs sizes, shapes, and locations  PRMs to PRRs mapping  Other additional optimization variables can be defined Several possible cost functions:  Area wastage  Power usage  Application latency  Throughput  …

17 Framework analysis – PRR Geometries PR system design flows require:  Proper metrics for PRR performance analysis  Design guidelines for efficient PRR floorplanning Study of the effects of varying PRR shape over  Maximum Clock Frequency  Partial Bitstream Size Five separate test cores:  Beamforming (DSP/slice)  CFAR (slice/memory)  AES (register)  ARM7 softcore (hybrid)  Sine/Cosine LUT (memory) Performed on V4SX55 thus far Aspect ratio = PRR Height / PRR Width

18 Framework analysis – Beamforming (~125 MHz, 40%) 5022 slices 16 DSP48s 17 RAMB16s Baseline, non-PR performance = 1614 kB, MHz Clock frequency (MHz)Bitstream size (kB) Aspect ratio

19 Framework analysis – CFAR (~100 MHz, 16%) 2610 slices 2 DSP48s 34 RAMB16s Baseline, non-PR performance = 1001 kB, MHz Clock frequency (MHz)Bitstream size (kB) Aspect ratio

20 Framework analysis – AES (~80 MHz, 13.75%) 3634 slices 3943 registers 4 RAMB16s Baseline, non-PR performance = 1393 kB, MHz Clock frequency (MHz) Bitstream size (kB) Aspect ratio

21 Framework analysis – ARM7 (~40 MHz, 6.8%) 1826 slices 16 DSP48s 10 RAMB16s Baseline, non-PR performance = 872 kB, MHz Clock frequency (MHz) Bitstream size (kB) Aspect ratio

22 Framework analysis – Sine/Cosine LUT 107 slices 27 RAMB16s Baseline, non-PR performance = 571 kB, MHz Clock frequency (MHz) Bitstream size (kB) Aspect ratio

23 Framework analysis – PRR Geometries Slice-intensive designs show best bitstream size/clock frequency performance with aspect ratio around 2-4  Roughly equivalent to aspect ratio of the FPGA as a whole Non-slice intensive designs show best bitstream performance with aspect ratio >> 4  Due to columnar distribution of RAMB16/DSP48 resources on chip  Clock frequency relatively insensitive to aspect ratio  Not shown in graph: resource wastage also improved Results are more pronounced for high frequency designs However, aspect ratio not the only design consideration  Placement on a chip relative to other regions, pins, or resources may affect (restrict) choice of PRR shape

24 Conclusions - Contributions of this work Taxonomy for PR systems design flows and a design methodology for efficient development of each type Identification of relevant optimization variables and constraints  Number of PRRs, optimal mapping of PRMs to PRRs, system floorplanning  Propose their incorporation in a future automatic design tool Study of the effects of varying PRR shape  Maximum Clock Frequency  Partial Bitstream Size  Multiple classes of cores/designs Memory-intensive DSP-intensive Combinational Logic-intensive Register-intensive Etc. PRR floorplanning guidelines definitions and delivery

25 Questions