Presentation is loading. Please wait.

Presentation is loading. Please wait.

Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse Grain and Fine Grain Optimizations.

Similar presentations


Presentation on theme: "Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse Grain and Fine Grain Optimizations."— Presentation transcript:

1 Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu/~spark Coordinated Coarse Grain and Fine Grain Optimizations for High-Level Synthesis Supported by Semiconductor Research Corporation Sumit Gupta

2 High Level Synthesis M e m o r y ALU Control Data path d = e - fg = h + i If Node TF c x = a + b c = a < b j = d x g l = e + x x = a + b; c = a < b; if (c) then d = e – f; else g = h + i; j = d x g; l = e + x; Transform behavioral descriptions to RTL/gate level From C to CDFG to Architecture

3 Our Approach to HLS Optimizing Compiler and Parallelizing Compiler transformations applied at Source-level (Pre-synthesis) and during Scheduling Optimizing Compiler and Parallelizing Compiler transformations applied at Source-level (Pre-synthesis) and during Scheduling Source-level code refinement using Pre-synthesis transformations Source-level code refinement using Pre-synthesis transformations Code Restructuring by Speculative Code Motions Code Restructuring by Speculative Code Motions Operation replication to improve concurrency Operation replication to improve concurrency Transformations applied dynamically during scheduling to exploit new opportunities due to code motions Transformations applied dynamically during scheduling to exploit new opportunities due to code motions Extract a high degree of parallelization using extensive Code Transformations Extract a high degree of parallelization using extensive Code Transformations Improve Resource Utilization and increase Code Compaction Improve Resource Utilization and increase Code Compaction Reduce impact of programming style and control constructs on HLS results Reduce impact of programming style and control constructs on HLS results  Our approach is particularly suited to descriptions with nested conditionals and loops C Input VHDL Output Original CDFG Optimized CDFG Scheduling & Binding Source-Level Compiler Transformations Scheduling Compiler Transformations

4 Hierarchical Intermediate Representation We use Hierarchical Task Graphs (HTGs) We use Hierarchical Task Graphs (HTGs) Maintain structured view of design description Maintain structured view of design description Consists of hierarchy of basic blocks and HTG nodes Consists of hierarchy of basic blocks and HTG nodes 3 Types of HTG Nodes: 3 Types of HTG Nodes: Single: No sub-nodes Single: No sub-nodes Compound: sub-nodes Compound: sub-nodes Loop: Encapsulate loops Loop: Encapsulate loops Augmented by data dependency graphs Augmented by data dependency graphs Enable Coarse-Grain transformations Enable Coarse-Grain transformations

5

6 Trailblazing : Hierarchical Code Motion Technique Can move operations across large pieces of code without visiting each node in between Can move operations across large pieces of code without visiting each node in between

7 Speculative Code Motions + + If Node TF Reverse Speculation Conditional Speculation Across Hierarchical Blocks _ a b c Operation Movement to reduce impact of Programming Style on Quality of HLS Results Early Condition Execution Evaluates conditions As soon as possible

8 Scheduling Heuristic BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB 8 + + + Speculate c b d + + a Get Available Ops Get Available Ops a, b, c, d a, b, c, d Determine Code Motions Required Determine Code Motions Required Assign Cost to each Operation Assign Cost to each Operation Cost is based on data dependency chain Cost is based on data dependency chain Schedule Op with lowest Cost Schedule Op with lowest Cost BB 0 BB 9 Speculate Across HTG

9 BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB 8 + + c b + a BB 0 BB 9 + d Scheduling Heuristic BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB 8 + + + c b d + + a BB 0 BB 9 Speculate Across HTG

10 Increasing the Scope of Code Motions If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d S0 S1 S2 S3 + Resource Allocation Original Design If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d Scheduled Design Unbalanced Conditional

11 Insert New Scheduling Step in Shorter Branch If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d S0 S1 S2 + Resource Allocation If Node TF BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d e __ e

12 Common Sub-Expression Elimination a = b + c; c = b < c; if (c) d = b + c; else e = g + h; C Description BB 2BB 3 BB 1 d = b + c BB 4 a = b + c e = g + h HTG Representation If Node T F BB 0 BB 2BB 3 BB 1 d = a BB 4 a = b + c e = g + h After CSE If Node TF BB 0

13 New Opportunities for “Dynamic” CSE Due to Speculative Code Motions BB 2BB 3 BB 1 a = b + c BB 6BB 7 BB 5 d = b + c BB 4 BB 8 Speculate BB 2BB 3 BB 1 a = dcse BB 6BB 7 BB 5 d = dcse BB 4 BB 8 dcse = b + c BB 0

14 SPARK High Level Synthesis Framework

15 Experimentation Experiments for several transformations Experiments for several transformations Pre-synthesis transformations: loop invariant code motions, CSE Pre-synthesis transformations: loop invariant code motions, CSE Speculative Code Motions Speculative Code Motions Dynamic CSE Dynamic CSE We have used Spark to synthesize designs derived from several industrial designs We have used Spark to synthesize designs derived from several industrial designs MPEG-1, MPEG-2, GIMP Image Processing software MPEG-1, MPEG-2, GIMP Image Processing software Scheduling Results Scheduling Results Number of States in FSM Number of States in FSM Cycles on Longest Path through Design Cycles on Longest Path through Design VHDL: Logic Synthesis VHDL: Logic Synthesis Critical Path Length (ns) Critical Path Length (ns) Unit Area Unit Area

16 Target Applications Design # of Ifs # of Loops # Non-Empty Basic Blocks # of Operations MPEG-1 pred1 4217123 MPEG-1 pred2 11645287 MPEG-2 dp_frame 18461260 GIMPtiler11235150

17 Code Motions: Logic Synthesis Results Within Basic Blocks & Across Hierar. Blocks + Speculation + Reverse Speculation & Early Condition Execution Condition Speculation

18 CSE/Dynamic CSE Results All Code Motions Enabled + Only CSE + Only Dynamic CSE + CSE & Dynamic CSE

19 Conclusions Parallelizing code transformations enable a new range of HLS transformations Parallelizing code transformations enable a new range of HLS transformations Can provide the needed improvement in quality of HLS results for them to be competitive against manually designed circuits. Can provide the needed improvement in quality of HLS results for them to be competitive against manually designed circuits. Synthesis approach can dominate SOC embedded systems design Synthesis approach can dominate SOC embedded systems design Can enable productivity improvements in microelectronic design Can enable productivity improvements in microelectronic design Built a synthesis system with a range of code transformations Built a synthesis system with a range of code transformations Platform for applying Coarse and Fine-grain Optimizations Platform for applying Coarse and Fine-grain Optimizations Code transformations address complex control flow Code transformations address complex control flow Tool-box approach where transformations and heuristics can be developed Tool-box approach where transformations and heuristics can be developed Enables finding the right synthesis script for different application domains Enables finding the right synthesis script for different application domains Performance improvements of 60-70 % across a number of designs Performance improvements of 60-70 % across a number of designs We have also shown its effectiveness on an Intel design We have also shown its effectiveness on an Intel design

20 Publications Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive Designs Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive Designs S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, To appear in DATE, March 2003 S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, To appear in DATE, March 2003 SPARK : A High-Level Synthesis Framework For Applying Parallelizing Compiler Transformations S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, VLSI Design 2003 Best Paper Award SPARK : A High-Level Synthesis Framework For Applying Parallelizing Compiler Transformations S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, VLSI Design 2003 Best Paper Award Dynamic Common Sub-Expression Elimination during Scheduling in High-Level Synthesis S. Gupta, M. Reshadi, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2002 Dynamic Common Sub-Expression Elimination during Scheduling in High-Level Synthesis S. Gupta, M. Reshadi, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2002 Coordinated Transformations for High-Level Synthesis of High Performance Microprocessor Blocks S. Gupta, T. Kam, M. Kishinevsky, S. Rotem, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, DAC 2002 Coordinated Transformations for High-Level Synthesis of High Performance Microprocessor Blocks S. Gupta, T. Kam, M. Kishinevsky, S. Rotem, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, DAC 2002 Conditional Speculation and its Effects on Performance and Area for High-Level Synthesis S. Gupta, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2001 Conditional Speculation and its Effects on Performance and Area for High-Level Synthesis S. Gupta, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2001 Speculation Techniques for High Level synthesis of Control Intensive Designs S. Gupta, N. Savoiu, S. Kim, N.D. Dutt, R.K. Gupta, A. Nicolau, DAC 2001 Speculation Techniques for High Level synthesis of Control Intensive Designs S. Gupta, N. Savoiu, S. Kim, N.D. Dutt, R.K. Gupta, A. Nicolau, DAC 2001 Analysis of High-level Address Code Transformations for Programmable Processors S. Gupta, M. Miranda, F. Catthoor, R. K. Gupta, DATE 2000 Analysis of High-level Address Code Transformations for Programmable Processors S. Gupta, M. Miranda, F. Catthoor, R. K. Gupta, DATE 2000 Synthesis of Testable RTL Designs using Adaptive Simulated Annealing Algorithm C.P. Ravikumar, S. Gupta, A. Jajoo, Intl. Conf. on VLSI Design, 1998 Best Student Paper Award Synthesis of Testable RTL Designs using Adaptive Simulated Annealing Algorithm C.P. Ravikumar, S. Gupta, A. Jajoo, Intl. Conf. on VLSI Design, 1998 Best Student Paper Award Book Chapter ASIC Design, S. Gupta, R. K. Gupta, Chapter 64, The VLSI Handbook, Edited by Wai- Kai Chen ASIC Design, S. Gupta, R. K. Gupta, Chapter 64, The VLSI Handbook, Edited by Wai- Kai Chen


Download ppt "Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse Grain and Fine Grain Optimizations."

Similar presentations


Ads by Google