Download presentation
Presentation is loading. Please wait.
1
Center for Embedded Computer Systems University of California, Irvine and San Diego http://www.cecs.uci.edu/~spark Hardware and Interface Synthesis of FPGA Blocks using Parallelizing Code Transformations Supported by Semiconductor Research Corporation Sumit Gupta Manev Luthra, Nikil Dutt, Alex Nicolau Rajesh Gupta
2
Copyright Sumit Gupta 2003 2 Overview of a HW/SW Co-design Flow Processor Core Memory I/O Hardware Behavioral Description Software Behavioral Description Software Compiler High Level Synthesis Application Specs HW/SW Partitioning Interface Synthesis Logic Synthesis and P&R C RTL VHDL ASICFPGA
3
Copyright Sumit Gupta 2003 3 Outline Motivation and Background Motivation and Background Our Approach to Parallelizing High-Level Synthesis Our Approach to Parallelizing High-Level Synthesis Our Approach to Interface Synthesis using Memory Mapping Our Approach to Interface Synthesis using Memory Mapping Experimental Results Experimental Results HLS: Multimedia and Image Processing Applications HLS: Multimedia and Image Processing Applications Interface Synthesis: Case Study of MPEG-1 block Interface Synthesis: Case Study of MPEG-1 block Conclusions and Future Work Conclusions and Future Work
4
Copyright Sumit Gupta 2003 4 High Level Synthesis M e m o r y ALU Control Data path d = e - fg = h + i If Node TF c x = a + b c = a < b j = d x g l = e + x x = a + b; c = a < b; if (c) then d = e – f; else g = h + i; j = d x g; l = e + x; Transform behavioral descriptions to RTL/gate level From C to CDFG to Architecture Problem # 1 : Poor quality of HLS results beyond straight-line behavioral descriptions Poor/No controllability of the HLS results Problem # 2 :
5
Copyright Sumit Gupta 2003 5 High-level Synthesis Well-researched area: from early 1980’s Well-researched area: from early 1980’s Renewed interest due to new system level design methodologies Renewed interest due to new system level design methodologies Large number of synthesis optimizations have been proposed Large number of synthesis optimizations have been proposed Either operation level: algebraic transformations on DSP codes Either operation level: algebraic transformations on DSP codes or logic level: Don’t Care based control optimizations or logic level: Don’t Care based control optimizations In contrast, compiler transformations operate at both operation level (fine-grain) and source level (coarse-grain) In contrast, compiler transformations operate at both operation level (fine-grain) and source level (coarse-grain) Parallelizing Compiler Transformations Parallelizing Compiler Transformations Different optimization objectives and cost models than HLS Different optimization objectives and cost models than HLS Our aim: Develop Synthesis and Parallelizing Compiler Transformations that are “useful” for HLS Beyond scheduling results: in Circuit Area and Delay Beyond scheduling results: in Circuit Area and Delay For large designs with complex control flow (nested conditionals/loops) For large designs with complex control flow (nested conditionals/loops)
6
Copyright Sumit Gupta 2003 6 Our Approach: Parallelizing HLS (PHLS) C Input VHDL Output Original CDFG Optimized CDFG Scheduling & Binding Source-Level Compiler Transformations Scheduling Compiler & Dynamic Transformations Optimizing Compiler and Parallelizing Compiler transformations applied at Source-level (Pre-synthesis) and during Scheduling Optimizing Compiler and Parallelizing Compiler transformations applied at Source-level (Pre-synthesis) and during Scheduling Source-level code refinement using Pre-synthesis transformations Source-level code refinement using Pre-synthesis transformations Code Restructuring by Speculative Code Motions Code Restructuring by Speculative Code Motions Operation replication to improve concurrency Operation replication to improve concurrency Dynamic transformations: exploit new opportunities during scheduling Dynamic transformations: exploit new opportunities during scheduling
7
Copyright Sumit Gupta 2003 7 SPARK High Level Synthesis Framework
8
Copyright Sumit Gupta 2003 8 SPARK Parallelizing HLS Framework C input and Synthesizable RTL VHDL output C input and Synthesizable RTL VHDL output Tool-box of Transformations and Heuristics Tool-box of Transformations and Heuristics Each of these can be developed independently of the other Each of these can be developed independently of the other Script based control over transformations & heuristics Script based control over transformations & heuristics Hierarchical Intermediate Representation (HTGs) Hierarchical Intermediate Representation (HTGs) Retains structural information about design (conditional blocks, loops) Retains structural information about design (conditional blocks, loops) Enables efficient and structured application of transformations Enables efficient and structured application of transformations Complete HLS tool: Does Binding, Control Synthesis and Backend VHDL generation Complete HLS tool: Does Binding, Control Synthesis and Backend VHDL generation Interconnect Minimizing Resource Binding Interconnect Minimizing Resource Binding Enables Graphical Visualization of Design description and intermediate results Enables Graphical Visualization of Design description and intermediate results 100,000+ lines of C++ code 100,000+ lines of C++ code
9
Copyright Sumit Gupta 2003 9 Interface Synthesis “Glue” logic to make the hardware and software talk to each other “Glue” logic to make the hardware and software talk to each other Data must be transferred or shared between the hardware and software Data must be transferred or shared between the hardware and software Our approach to Interface Synthesis: Our approach to Interface Synthesis: Map common data to a shared memory Map common data to a shared memory Our focus is on optimizing the number and size of memory banks used for mapping Our focus is on optimizing the number and size of memory banks used for mapping Target architecture: Platform with FPGA and Processor core on same FPGA fabric Target architecture: Platform with FPGA and Processor core on same FPGA fabric Take advantage of the Memory Blocks available on such FPGA platforms for implementing the shared memory Take advantage of the Memory Blocks available on such FPGA platforms for implementing the shared memory
10
Copyright Sumit Gupta 2003 10 The Interface Hardware Interface Logic CPU Bus Processor Core Peripheral Peripheral Custom Logic Shared Memory Main Memory Interface Hardware System architecture illustrating the interface HW System architecture illustrating the interface HW Data exchanged between HW and SW resides in the shared memory
11
Copyright Sumit Gupta 2003 11 Target HW Interface Architecture Target HW Interface Architecture Control Logic Memory Controller Bus Interface Appl. Kernel Mapped Local Memory FPGA Fabric Processor Core CPU Bus Multiple PortsSingle Port
12
Copyright Sumit Gupta 2003 12 Local Memory An Example of Memory Mapping Inefficient use of the FPGA fabric by implementing independent registers Var N Var 2Var 1 FU 1 FU M N:2M Mux Var 1 Var 2 FU 1 FU M K:2M Mux Var 3 Mem 1 Efficient use of FPGA resources by utilizing its abundant memory blocks Var N Mem 2 Mem K
13
Copyright Sumit Gupta 2003 13 Mapping Designs with Control Flow Mapping variables to memory banks is a well- known technique Mapping variables to memory banks is a well- known technique What’s new here: Use of What’s new here: Use of Scheduling info from High Level Synthesis tool Scheduling info from High Level Synthesis tool Mutual exclusivity info in designs with control flow Mutual exclusivity info in designs with control flow Data access patterns resolved only during run time Data access patterns resolved only during run time So how do we perform memory mapping at compile time ? So how do we perform memory mapping at compile time ? If Node TF BB 2 BB 1 BB 3 a b a c b c a b C0 C1 Access Pattern: {(a, b), (a, c)} or {(a, b, c), (a, b)} {(a, b, c), (a, b)} Condition resolved at run time a
14
Copyright Sumit Gupta 2003 14 Our Approach to Memory Mapping We use We use Scheduling information from SPARK Synthesis tool Scheduling information from SPARK Synthesis tool Cycles in which variables are accessed (read/write) Cycles in which variables are accessed (read/write) Mutual exclusivity information from the control flow graph Mutual exclusivity information from the control flow graph The conditions under which variables are accessed The conditions under which variables are accessed Capture this information using Multi-color Conflict Graphs Capture this information using Multi-color Conflict Graphs Then create a Variable to Memory Bank Mapping that minimizes the number of memory instances Then create a Variable to Memory Bank Mapping that minimizes the number of memory instances Details available in ICCD 2003 paper Details available in ICCD 2003 paper
15
Copyright Sumit Gupta 2003 15 Experiments for SPARK HLS Tool Results presented here for Results presented here for Pre-synthesis transformations Pre-synthesis transformations Speculative Code Motions Speculative Code Motions Dynamic CSE Dynamic CSE We used SPARK to synthesize designs derived from several industrial designs We used SPARK to synthesize designs derived from several industrial designs MPEG-1, MPEG-2, GIMP Image Processing software MPEG-1, MPEG-2, GIMP Image Processing software Case Study of Intel Instruction Length Decoder Case Study of Intel Instruction Length Decoder Scheduling Results Scheduling Results Number of States in FSM Number of States in FSM Cycles on Longest Path through Design Cycles on Longest Path through Design VHDL: Logic Synthesis VHDL: Logic Synthesis Critical Path Length (ns) Critical Path Length (ns) Unit Area Unit Area
16
Copyright Sumit Gupta 2003 16 Target Applications Design # of Ifs # of Loops # Non-Empty Basic Blocks # of Operations MPEG-1 pred1 4217123 MPEG-1 pred2 11645287 MPEG-2 dp_frame 18461260 GIMPtiler11235150
17
Copyright Sumit Gupta 2003 17 + Speculative Code Motions + Pre-Synthesis Transforms + Dynamic CSE Scheduling & Logic Synthesis Results Non-speculative CMs: Within BBs & Across Hier Blocks 42% 10% 36% 8% 39% Overall: 63-66 % improvement in Delay Almost constant Area
18
Copyright Sumit Gupta 2003 18 Non-speculative CMs: Within BBs & Across Hier Blocks + Speculative Code Motions + Pre-Synthesis Transforms + Dynamic CSE Scheduling & Logic Synthesis Results 14% 20% 1% 33% 41% 52% Overall: 48-76 % improvement in Delay Almost constant Area
19
Copyright Sumit Gupta 2003 19 Interface Synthesis: Case Study Target Platform – Altera’s Nios development board @33.33 MHz Target Platform – Altera’s Nios development board @33.33 MHz Soft core of the Nios embedded processor (no caches) Soft core of the Nios embedded processor (no caches) Library of peripheral IP’s like timer, UART, SRAM controller Library of peripheral IP’s like timer, UART, SRAM controller Altera FPGA with 8320 Logic Cells Altera FPGA with 8320 Logic Cells Tools Tools The SPARK parallelizing high level synthesis framework for high level synthesis The SPARK parallelizing high level synthesis framework for high level synthesis Leonardo Spectrum and Quartus II for logic synthesis and P&R respectively Leonardo Spectrum and Quartus II for logic synthesis and P&R respectively The Nios specific GNU toolkit for compiling C code The Nios specific GNU toolkit for compiling C code
20
Copyright Sumit Gupta 2003 20 Example: MPEG-1 Prediction Block Design # of Ifs # of Loops # Non-Empty Basic Blocks # of Operations MPEG-1 pred1 4217123 MPEG-1 pred2 11645287 MPEG-1 pred1 used 1 out of 3 kernels mapped to HW MPEG-1 pred1 used 1 out of 3 kernels mapped to HW MPEG-2 pred2 used the remaining 2 kernels mapped to HW MPEG-2 pred2 used the remaining 2 kernels mapped to HW HW/SW interface consisted of 181 words of data HW/SW interface consisted of 181 words of data
21
Copyright Sumit Gupta 2003 21 Conclusions Parallelizing code transformations enable a new range of HLS transformations Parallelizing code transformations enable a new range of HLS transformations Provide the needed improvement in quality of HLS results Provide the needed improvement in quality of HLS results Possible to be competitive against manually designed circuits. Possible to be competitive against manually designed circuits. Can enable productivity improvements in microelectronic design Can enable productivity improvements in microelectronic design Built a C-to-VHDL synthesis system with a range of code transformations Built a C-to-VHDL synthesis system with a range of code transformations Platform for applying Coarse and Fine-grain Optimizations Platform for applying Coarse and Fine-grain Optimizations Tool-box approach where transformations and heuristics can be developed Tool-box approach where transformations and heuristics can be developed Script-based synthesis enables designer to control HLS process Script-based synthesis enables designer to control HLS process Performance improvements of 60-70 % across a number of designs Performance improvements of 60-70 % across a number of designs Presented an interface synthesis approach targeted at FPGA based platforms Presented an interface synthesis approach targeted at FPGA based platforms Based on a memory mapping algorithm that utilizes scheduling information for synthesizing designs with control flow Based on a memory mapping algorithm that utilizes scheduling information for synthesizing designs with control flow
22
Copyright Sumit Gupta 2003 22 SPARK Release Available for download Available for download http://www.cecs.uci.edu/~spark User Manual User Manual Running the tool Running the tool Customizing the synthesis scripts Customizing the synthesis scripts Tutorial Tutorial Synthesis of Portion of Motion Compensation algorithm in MPEG-1 player (along with logic synthesis scripts) Synthesis of Portion of Motion Compensation algorithm in MPEG-1 player (along with logic synthesis scripts)
23
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.