Presentation is loading. Please wait.

Presentation is loading. Please wait.

Design Framework for Partial Run-Time FPGA Reconfiguration Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research.

Similar presentations


Presentation on theme: "Design Framework for Partial Run-Time FPGA Reconfiguration Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research."— Presentation transcript:

1 Design Framework for Partial Run-Time FPGA Reconfiguration Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research Laboratory College of Engineering University of Florida ERSA 2008 Las Vegas, NV July 14–17, 2008

2 2 Outline Introduction Partial Reconfiguration (PR) Overview Proposed Design Methodologies Framework analysis Conclusions

3 3 General purpose I/O System controller FPGA Configuration lines Shared memory Battery Module A Module B Module A Module B Module A Module B Module C Introduction – Fully reconfigurable systems Bitstreams storage External I/O Design station Required design 1. Device too small for complex designs Module C Module B Module A Module B Module A Module C Module B Module A Module C 2. Big full bitstreams (long reconfiguration time) Config 1 Config 2 Config 3 Config 1 Request Config 2 Request 3. Complete system operation is halted prior to reconfiguration Does’nt fit Module C Module B disabled enabled disabled

4 4 Newer Xilinx FPGA families offer partial reconfiguration feature A rectangular region of the FPGA can be reconfigured without affecting the remaining FPGA area  System can continue operating without interruption Introduction – The Virtex 4 PR architecture ) Reconfigurable region 1 Reconfigurable region 2

5 5 Module A Module C Module B Introduction – A sample PR architecture FPGA Bitstreams storage Battery External I/O Module C 3. Smaller partial bitstreams Module A request 1. System controller does not need to be placed in an external device 2. Access to fast Internal Configuration Access Port (ICAP – 32 bits, 100 MHz) 4. No need to halt complete system when reconfiguring a module 5. Time multiplexing of FPGA resources, load and unload HW modules on demand Base system configuration JTAG Reconfigurable area disabled Controller (Microblaze) ICAP Flash controller Module C Module B enabled Module A enabled disabled Static area Module A Module B

6 6 Controller (Microblaze) ICAP Flash controller Introduction – Current PR Design Flow Steps  Partition the system into modules  Define static modules and reconfigurable modules  Decide the number of PR regions (PRRs)  Decide PRR sizes, shapes and locations  Map modules to PRRs  Define PRR interfaces, instantiate slice macros for PRR interfaces Optimization problems  Design partitioning  Number of PRRs  PRR sizes, shapes and locations  Mapping PRMs to PRRs  Type and placement of PRR interfaces Module A Module C Module B Static modules Reconfigurable Modules (PRMs) 1 2 FPGA # of PRRs? PRR 1 PRR 2 Static region Static modules Modules: A and B Modules: C Design partitioning Design floorplanning and budgeting

7 7 Introduction – Early Access PR Design Flow Introduced by Xilinx in FPL’06 Major improvements: Automatic implementation scripts Rectangular regions (not full column reconfiguration) Static nets can cross reconfigurable regions Slice macros replace bus macros Partitioning and floorplanning steps are manually executed  Design guidelines for these steps are not provided (manual) Placement and PRRs constraints PRM Bitstreams Design partitioning Design floorplanning and budgeting Xilinx PR Implementation Flow Full Initial Bistream Reconfigurable design specifications (automatic) Potential for development of automatic CAD tools

8 8 Introduction – Current PR design tools limitations PR design is a very specialized task Only a physical level of support is provided  Architectural knowledge of the target device is a must  Not very flexible, many design constraints Partitioning and floorplanning steps are manually executed  No performance sensitive design guidelines are provided  No automatic heuristics based design flow is available too Lack of abstraction from low level details discourages designers from using PR  Difficult for many end users In this work, we will propose a taxonomy of PR systems design flows and a efficient methodology for each type.

9 9 PR Overview – Taxonomy of PR systems design flows PR System Design Flow Multipurpose Special purpose Highly specialized systems design All PRMs that will exist on the system are known at design time Each PRR is independently optimized (size, shape, location, interface) based on the PRMs that will be mapped to it Output is: 1) Floorplan defining a static region and a set of optimized PRRs 2) The set of PRMs that can be placed in each PRR (PRMs to PRRs mapping) Not optimized for a specific application PRMs required by the application are not known when designing the base system Goal is to design a flexible and reusable base design that can be used for several different PR systems Base system designer defines a set of PRRs with fixed shapes, sizes, locations and interfaces Generated floorplan is used as input template for the PRMs implementation

10 10 Proposed Design Methodology: Special-Purpose Partition the system into several hardware modules Synthesize the hardware modules Use a control flow graph (CFG) and a states table to represent:  Application states and the transitions between them (execution path coverage)  Set of modules required in each application state Let’s see an example

11 11 Proposed Design Methodology: Special-Purpose 1. A, B are present in all states (static modules) 2. C, F, G and D are reconfigurable modules (PRMs) 3. F and G are mutually exclusive with respect to C (they can not be placed in the same PRR than C) 4. F, G, D and E can be placed in the same PRR 5. C, D and E can be placed in the same PRR S1 S2 S5 S4 S3 STATEMODULES S1A, B, C S2A, B, C, F S3A, B, C, G S4A, B, D S5A, B, E Static Reconfigurable C F G D E Define region partitioning constraints Establishing constraints

12 12 4 ? 2 1 ? Proposed Design Methodology: Special-Purpose Define the number of PRRs to be used  Optimization variable  Number is computed based on CFG and states table # PRRs = Define a PRMs to PRRs mapping  Optimization problem  Combinatorial design space  Design space is reduced usign design constraints Static Region: PRR 1: PRR 2: A, B C, D, E F, G Possible solution (not necessarily the optimal)

13 13 Module A Module B Module C Module D Module E Module F Module G And when do we size our PRRs?  Don’t worry, it is our next step Proposed Design Methodology: Special-Purpose Required static region resources (Resources are added) Required PRR 1 Resources (Maximum of each resource type) Required PRR 2 Resources (Maximum of each resource type) Modules profile Slices BRAMs DSP48s

14 14 Final optimized custom base system floorplan Define the PRR sizes, shapes, locations inside the FPGA fabric  Floorplanning optimization problem  Proper metrics for PRR performance analysis are required  Design guidelines for efficient PRR floorplanning are also a necessity Proposed Design Methodology: Special-Purpose FPGA Static region PRR 1 Resources PRR 2 Resources Reconfigurable region with enough resources for PRR1 PRR1 PRR2 We do the same for PRR2 Define PRR interfaces  Place slice macros

15 15 Proposed Design Methodology: Special-Purpose Methodology outputs Custom base system PRMs to PRRs mapping They are used as input files for the automatic Xilinx PR Design Flow

16 16 Proposed Design Methodology: Special-Purpose Opportunity to automate this flow through design tools Optimization variables  Number of PRRs  PRRs sizes, shapes, and locations  PRMs to PRRs mapping  Other additional optimization variables can be defined Several possible cost functions:  Area wastage  Power usage  Application latency  Throughput  …

17 17 Framework analysis – PRR Geometries PR system design flows require:  Proper metrics for PRR performance analysis  Design guidelines for efficient PRR floorplanning Study of the effects of varying PRR shape over  Maximum Clock Frequency  Partial Bitstream Size Five separate test cores:  Beamforming (DSP/slice)  CFAR (slice/memory)  AES (register)  ARM7 softcore (hybrid)  Sine/Cosine LUT (memory) Performed on V4SX55 thus far Aspect ratio = PRR Height / PRR Width

18 18 Framework analysis – Beamforming (~125 MHz, 40%) 5022 slices 16 DSP48s 17 RAMB16s Baseline, non-PR performance = 1614 kB, 127.845 MHz Clock frequency (MHz)Bitstream size (kB) Aspect ratio

19 19 Framework analysis – CFAR (~100 MHz, 16%) 2610 slices 2 DSP48s 34 RAMB16s Baseline, non-PR performance = 1001 kB, 103.616 MHz Clock frequency (MHz)Bitstream size (kB) Aspect ratio

20 20 Framework analysis – AES (~80 MHz, 13.75%) 3634 slices 3943 registers 4 RAMB16s Baseline, non-PR performance = 1393 kB, 80.483 MHz Clock frequency (MHz) Bitstream size (kB) Aspect ratio

21 21 Framework analysis – ARM7 (~40 MHz, 6.8%) 1826 slices 16 DSP48s 10 RAMB16s Baseline, non-PR performance = 872 kB, 40.985 MHz Clock frequency (MHz) Bitstream size (kB) Aspect ratio

22 22 Framework analysis – Sine/Cosine LUT 107 slices 27 RAMB16s Baseline, non-PR performance = 571 kB, 204.918 MHz Clock frequency (MHz) Bitstream size (kB) Aspect ratio

23 23 Framework analysis – PRR Geometries Slice-intensive designs show best bitstream size/clock frequency performance with aspect ratio around 2-4  Roughly equivalent to aspect ratio of the FPGA as a whole Non-slice intensive designs show best bitstream performance with aspect ratio >> 4  Due to columnar distribution of RAMB16/DSP48 resources on chip  Clock frequency relatively insensitive to aspect ratio  Not shown in graph: resource wastage also improved Results are more pronounced for high frequency designs However, aspect ratio not the only design consideration  Placement on a chip relative to other regions, pins, or resources may affect (restrict) choice of PRR shape

24 24 Conclusions - Contributions of this work Taxonomy for PR systems design flows and a design methodology for efficient development of each type Identification of relevant optimization variables and constraints  Number of PRRs, optimal mapping of PRMs to PRRs, system floorplanning  Propose their incorporation in a future automatic design tool Study of the effects of varying PRR shape  Maximum Clock Frequency  Partial Bitstream Size  Multiple classes of cores/designs Memory-intensive DSP-intensive Combinational Logic-intensive Register-intensive Etc. PRR floorplanning guidelines definitions and delivery

25 25 Questions


Download ppt "Design Framework for Partial Run-Time FPGA Reconfiguration Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research."

Similar presentations


Ads by Google