Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse Grain and Fine Grain Optimizations.

Slides:

Advertisements

Similar presentations

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.

Advertisements

ECE-777 System Level Design and Automation Hardware/Software Co-design

ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling John Cavazos University.

ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)

High Level Languages: A Comparison By Joel Best. 2 Sources The Challenges of Synthesizing Hardware from C-Like Languages  by Stephen A. Edwards High-Level.

Program Representations. Representing programs Goals.

Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.

Modern VLSI Design 2e: Chapter 8 Copyright  1998 Prentice Hall PTR Topics n High-level synthesis. n Architectures for low power. n Testability and architecture.

Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.

Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &

08/31/2001Copyright CECS & The Spark Project SPARK High Level Synthesis System Sumit GuptaTimothy KamMichael KishinevskyShai Rotem Nick SavoiuNikil DuttRajesh.

FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.

08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine Conditional.

Common Sub-expression Elim Want to compute when an expression is available in a var Domain:

Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.

Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse-Grain and Fine-Grain Optimizations.

SCIENCES USC INFORMATION INSTITUTE An Open64-based Compiler Approach to Performance Prediction and Performance Sensitivity Analysis for Scientific Codes.

A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.

Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.

Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.

08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine High-Level.

Center for Embedded Computer Systems Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive.

Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A C-to-VHDL Parallelizing High-Level.

Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.

System Partitioning Kris Kuchcinski

Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse-Grain and Fine-Grain Optimizations.

Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.

Center for Embedded Computer Systems University of California, Irvine and San Diego Hardware and Interface Synthesis of.

Center for Embedded Computer Systems University of California, Irvine SPARK: A High-Level Synthesis Framework for Applying.

Center for Embedded Computer Systems University of California, Irvine Dynamic Common Sub-Expression Elimination during Scheduling.

Merging Synthesis With Layout For Soc Design -- Research Status Jinian Bian and Hongxi Xue Dept. Of Computer Science and Technology, Tsinghua University,

Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.

Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...

SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta

HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.

DAC 2001: Paper 18.2 Center for Embedded Computer Systems, UC Irvine Center for Embedded Computer Systems University of California, Irvine

Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.

Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.

Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.

Precision Going back to constant prop, in what cases would we lose precision?

CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {

A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.

Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.

CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.

1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.

A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.

R2D2 team R2D2 team Reconfigurable and Retargetable Digital Devices  Application domains Mobile telecommunications  WCDMA/UMTS (Wideband Code Division.

- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.

6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)

Presentation by Tom Hummel OverSoC: A Framework for the Exploration of RTOS for RSoC Platforms.

System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.

Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.

CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.

Slack Analysis in the System Design Loop Girish VenkataramaniCarnegie Mellon University, The MathWorks Seth C. Goldstein Carnegie Mellon University.

ECE 587 Hardware/Software Co- Design Lecture 23 LLVM and xPilot Professor Jia Wang Department of Electrical and Computer Engineering Illinois Institute.

George Mason University Finite State Machines Refresher ECE 545 Lecture 11.

High-level optimization Jakub Yaghob

Introduction to cosynthesis Rabi Mahapatra CSCE617

CSCI1600: Embedded and Real Time Software

From C to Elastic Circuits

Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke

Compiler Back End Panel

Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.

Compiler Back End Panel

Architectural-Level Synthesis

Architecture Synthesis

HIGH LEVEL SYNTHESIS.

ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL code ECE 448 – FPGA and ASIC Design.

CSCI1600: Embedded and Real Time Software

Presentation transcript:

Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse Grain and Fine Grain Optimizations for High-Level Synthesis Supported by Semiconductor Research Corporation Sumit Gupta

High Level Synthesis M e m o r y ALU Control Data path d = e - fg = h + i If Node TF c x = a + b c = a < b j = d x g l = e + x x = a + b; c = a < b; if (c) then d = e – f; else g = h + i; j = d x g; l = e + x; Transform behavioral descriptions to RTL/gate level From C to CDFG to Architecture

Our Approach to HLS Optimizing Compiler and Parallelizing Compiler transformations applied at Source-level (Pre-synthesis) and during Scheduling Optimizing Compiler and Parallelizing Compiler transformations applied at Source-level (Pre-synthesis) and during Scheduling Source-level code refinement using Pre-synthesis transformations Source-level code refinement using Pre-synthesis transformations Code Restructuring by Speculative Code Motions Code Restructuring by Speculative Code Motions Operation replication to improve concurrency Operation replication to improve concurrency Transformations applied dynamically during scheduling to exploit new opportunities due to code motions Transformations applied dynamically during scheduling to exploit new opportunities due to code motions Extract a high degree of parallelization using extensive Code Transformations Extract a high degree of parallelization using extensive Code Transformations Improve Resource Utilization and increase Code Compaction Improve Resource Utilization and increase Code Compaction Reduce impact of programming style and control constructs on HLS results Reduce impact of programming style and control constructs on HLS results  Our approach is particularly suited to descriptions with nested conditionals and loops C Input VHDL Output Original CDFG Optimized CDFG Scheduling & Binding Source-Level Compiler Transformations Scheduling Compiler Transformations

Hierarchical Intermediate Representation We use Hierarchical Task Graphs (HTGs) We use Hierarchical Task Graphs (HTGs) Maintain structured view of design description Maintain structured view of design description Consists of hierarchy of basic blocks and HTG nodes Consists of hierarchy of basic blocks and HTG nodes 3 Types of HTG Nodes: 3 Types of HTG Nodes: Single: No sub-nodes Single: No sub-nodes Compound: sub-nodes Compound: sub-nodes Loop: Encapsulate loops Loop: Encapsulate loops Augmented by data dependency graphs Augmented by data dependency graphs Enable Coarse-Grain transformations Enable Coarse-Grain transformations

Trailblazing : Hierarchical Code Motion Technique Can move operations across large pieces of code without visiting each node in between Can move operations across large pieces of code without visiting each node in between

Speculative Code Motions + + If Node TF Reverse Speculation Conditional Speculation Across Hierarchical Blocks _ a b c Operation Movement to reduce impact of Programming Style on Quality of HLS Results Early Condition Execution Evaluates conditions As soon as possible

Scheduling Heuristic BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB Speculate c b d + + a Get Available Ops Get Available Ops a, b, c, d a, b, c, d Determine Code Motions Required Determine Code Motions Required Assign Cost to each Operation Assign Cost to each Operation Cost is based on data dependency chain Cost is based on data dependency chain Schedule Op with lowest Cost Schedule Op with lowest Cost BB 0 BB 9 Speculate Across HTG

BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB c b + a BB 0 BB 9 + d Scheduling Heuristic BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB c b d + + a BB 0 BB 9 Speculate Across HTG

Increasing the Scope of Code Motions If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d S0 S1 S2 S3 + Resource Allocation Original Design If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d Scheduled Design Unbalanced Conditional

Insert New Scheduling Step in Shorter Branch If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d S0 S1 S2 + Resource Allocation If Node TF BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d e __ e

Common Sub-Expression Elimination a = b + c; c = b < c; if (c) d = b + c; else e = g + h; C Description BB 2BB 3 BB 1 d = b + c BB 4 a = b + c e = g + h HTG Representation If Node T F BB 0 BB 2BB 3 BB 1 d = a BB 4 a = b + c e = g + h After CSE If Node TF BB 0

New Opportunities for “Dynamic” CSE Due to Speculative Code Motions BB 2BB 3 BB 1 a = b + c BB 6BB 7 BB 5 d = b + c BB 4 BB 8 Speculate BB 2BB 3 BB 1 a = dcse BB 6BB 7 BB 5 d = dcse BB 4 BB 8 dcse = b + c BB 0

SPARK High Level Synthesis Framework

Experimentation Experiments for several transformations Experiments for several transformations Pre-synthesis transformations: loop invariant code motions, CSE Pre-synthesis transformations: loop invariant code motions, CSE Speculative Code Motions Speculative Code Motions Dynamic CSE Dynamic CSE We have used Spark to synthesize designs derived from several industrial designs We have used Spark to synthesize designs derived from several industrial designs MPEG-1, MPEG-2, GIMP Image Processing software MPEG-1, MPEG-2, GIMP Image Processing software Scheduling Results Scheduling Results Number of States in FSM Number of States in FSM Cycles on Longest Path through Design Cycles on Longest Path through Design VHDL: Logic Synthesis VHDL: Logic Synthesis Critical Path Length (ns) Critical Path Length (ns) Unit Area Unit Area

Target Applications Design # of Ifs # of Loops # Non-Empty Basic Blocks # of Operations MPEG-1 pred MPEG-1 pred MPEG-2 dp_frame GIMPtiler

Code Motions: Logic Synthesis Results Within Basic Blocks & Across Hierar. Blocks + Speculation + Reverse Speculation & Early Condition Execution Condition Speculation

CSE/Dynamic CSE Results All Code Motions Enabled + Only CSE + Only Dynamic CSE + CSE & Dynamic CSE

Conclusions Parallelizing code transformations enable a new range of HLS transformations Parallelizing code transformations enable a new range of HLS transformations Can provide the needed improvement in quality of HLS results for them to be competitive against manually designed circuits. Can provide the needed improvement in quality of HLS results for them to be competitive against manually designed circuits. Synthesis approach can dominate SOC embedded systems design Synthesis approach can dominate SOC embedded systems design Can enable productivity improvements in microelectronic design Can enable productivity improvements in microelectronic design Built a synthesis system with a range of code transformations Built a synthesis system with a range of code transformations Platform for applying Coarse and Fine-grain Optimizations Platform for applying Coarse and Fine-grain Optimizations Code transformations address complex control flow Code transformations address complex control flow Tool-box approach where transformations and heuristics can be developed Tool-box approach where transformations and heuristics can be developed Enables finding the right synthesis script for different application domains Enables finding the right synthesis script for different application domains Performance improvements of % across a number of designs Performance improvements of % across a number of designs We have also shown its effectiveness on an Intel design We have also shown its effectiveness on an Intel design

Publications Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive Designs Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive Designs S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, To appear in DATE, March 2003 S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, To appear in DATE, March 2003 SPARK : A High-Level Synthesis Framework For Applying Parallelizing Compiler Transformations S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, VLSI Design 2003 Best Paper Award SPARK : A High-Level Synthesis Framework For Applying Parallelizing Compiler Transformations S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, VLSI Design 2003 Best Paper Award Dynamic Common Sub-Expression Elimination during Scheduling in High-Level Synthesis S. Gupta, M. Reshadi, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2002 Dynamic Common Sub-Expression Elimination during Scheduling in High-Level Synthesis S. Gupta, M. Reshadi, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2002 Coordinated Transformations for High-Level Synthesis of High Performance Microprocessor Blocks S. Gupta, T. Kam, M. Kishinevsky, S. Rotem, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, DAC 2002 Coordinated Transformations for High-Level Synthesis of High Performance Microprocessor Blocks S. Gupta, T. Kam, M. Kishinevsky, S. Rotem, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, DAC 2002 Conditional Speculation and its Effects on Performance and Area for High-Level Synthesis S. Gupta, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2001 Conditional Speculation and its Effects on Performance and Area for High-Level Synthesis S. Gupta, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2001 Speculation Techniques for High Level synthesis of Control Intensive Designs S. Gupta, N. Savoiu, S. Kim, N.D. Dutt, R.K. Gupta, A. Nicolau, DAC 2001 Speculation Techniques for High Level synthesis of Control Intensive Designs S. Gupta, N. Savoiu, S. Kim, N.D. Dutt, R.K. Gupta, A. Nicolau, DAC 2001 Analysis of High-level Address Code Transformations for Programmable Processors S. Gupta, M. Miranda, F. Catthoor, R. K. Gupta, DATE 2000 Analysis of High-level Address Code Transformations for Programmable Processors S. Gupta, M. Miranda, F. Catthoor, R. K. Gupta, DATE 2000 Synthesis of Testable RTL Designs using Adaptive Simulated Annealing Algorithm C.P. Ravikumar, S. Gupta, A. Jajoo, Intl. Conf. on VLSI Design, 1998 Best Student Paper Award Synthesis of Testable RTL Designs using Adaptive Simulated Annealing Algorithm C.P. Ravikumar, S. Gupta, A. Jajoo, Intl. Conf. on VLSI Design, 1998 Best Student Paper Award Book Chapter ASIC Design, S. Gupta, R. K. Gupta, Chapter 64, The VLSI Handbook, Edited by Wai- Kai Chen ASIC Design, S. Gupta, R. K. Gupta, Chapter 64, The VLSI Handbook, Edited by Wai- Kai Chen