08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine Conditional.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Xianfeng Li Tulika Mitra Abhik Roychoudhury
ECE 667 Synthesis and Verification of Digital Circuits
1 Optimization Optimization = transformation that improves the performance of the target code Optimization must not change the output must not cause errors.
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
High Level Languages: A Comparison By Joel Best. 2 Sources The Challenges of Synthesizing Hardware from C-Like Languages  by Stephen A. Edwards High-Level.
Optimal Instruction Scheduling for Multi-Issue Processors using Constraint Programming Abid M. Malik and Peter van Beek David R. Cheriton School of Computer.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Winter 2005ICS 252-Intro to Computer Design ICS 252 Introduction to Computer Design Lecture 5-Scheudling Algorithms Winter 2005 Eli Bozorgzadeh Computer.
Modern VLSI Design 2e: Chapter 8 Copyright  1998 Prentice Hall PTR Topics n High-level synthesis. n Architectures for low power. n Testability and architecture.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 1: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Register-transfer Design n Basics of register-transfer design: –data paths and controllers.
Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &
08/31/2001Copyright CECS & The Spark Project SPARK High Level Synthesis System Sumit GuptaTimothy KamMichael KishinevskyShai Rotem Nick SavoiuNikil DuttRajesh.
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi.
Behavioral Synthesis Outline –Synthesis Procedure –Example –Domain-Specific Synthesis –Silicon Compilers –Example Tools Goal –Understand behavioral synthesis.
ECE Synthesis & Verification - Lecture 2 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse-Grain and Fine-Grain Optimizations.
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse Grain and Fine Grain Optimizations.
08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine High-Level.
Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s
Center for Embedded Computer Systems Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive.
Digital Design – Optimizations and Tradeoffs
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A C-to-VHDL Parallelizing High-Level.
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse-Grain and Fine-Grain Optimizations.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Center for Embedded Computer Systems University of California, Irvine and San Diego Hardware and Interface Synthesis of.
Center for Embedded Computer Systems University of California, Irvine SPARK: A High-Level Synthesis Framework for Applying.
Center for Embedded Computer Systems University of California, Irvine Dynamic Common Sub-Expression Elimination during Scheduling.
Merging Synthesis With Layout For Soc Design -- Research Status Jinian Bian and Hongxi Xue Dept. Of Computer Science and Technology, Tsinghua University,
Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.
ICS 252 Introduction to Computer Design
SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta
DAC 2001: Paper 18.2 Center for Embedded Computer Systems, UC Irvine Center for Embedded Computer Systems University of California, Irvine
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung.
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
CAD for VLSI Ramakrishna Lecture#2.
Area-Efficient Instruction Set Synthesis for Reconfigurable System on Chip Designs Philip BriskAdam KaplanMajid Sarrafzadeh Embedded and Reconfigurable.
Test complexity of TED operations Use canonical property of TED for - Software Verification - Algorithm Equivalence check - High Level Synthesis M ac iej.
Slack Analysis in the System Design Loop Girish VenkataramaniCarnegie Mellon University, The MathWorks Seth C. Goldstein Carnegie Mellon University.
Operation Tables for Scheduling in the presence of Partial Bypassing Aviral Shrivastava 1 Eugene Earlie 2 Nikil Dutt 1 Alex Nicolau 1 1 Center For Embedded.
Architecture and Synthesis for Multi-Cycle Communication
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Ann Gordon-Ross and Frank Vahid*
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
Architectural-Level Synthesis
Architecture Synthesis
HIGH LEVEL SYNTHESIS.
ICS 252 Introduction to Computer Design
ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL code ECE 448 – FPGA and ASIC Design.
Presentation transcript:

08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine Conditional Speculation and its Effects on Performance and Area for High-Level Synthesis Sumit GuptaNikil Dutt Nick Savoiu Rajesh Gupta Alex Nicolau SPARK High Level Synthesis System Supported by Semiconductor Research Corporation

2 Copyright CECS & The Spark Project High Level Synthesis From C to CDFG to Architecture

3 Copyright CECS & The Spark Project Scheduling with Given Resource Allocation Resource Constraints +<

4 Copyright CECS & The Spark Project Conditional Speculation ++ - < Resource Constraints +< Conditionally Speculate Unused/idle resource slots

5 Copyright CECS & The Spark Project Conditional Speculation ++- < Resource Constraints +< -  Higher resource utilization inside conditionals  Shorter schedule lengths

6 Copyright CECS & The Spark Project Creation of idle slots by Speculation -+ - < BB 1BB 2 BB 3 BB 0 xy aa z Speculate

7 Copyright CECS & The Spark Project After Speculation -+ - < BB 1BB 2 BB 3 BB 0 xy aa z a = ba = c bc Conditionally Speculate

8 Copyright CECS & The Spark Project After Conditional Speculation -+< BB 1BB 2 BB 3 BB 0 xy a = ba = c b c -- b c z1 z2

9 Copyright CECS & The Spark Project Generalized Code Motions If Node TF Conditional Speculation Reverse Speculation Speculation Across Hierarchical Blocks

10 Copyright CECS & The Spark Project Recent Related Work  Code motions in the presence of conditionals u Condition Vector List Scheduling [Wakabayashi 89] u Symbolic Scheduling [Radivojevic 96] u WaveSched Scheduler [Lakshminarayana 98] u Basic Block Control Graph Scheduling [Santos 99]  Limitations u Arbitrary nesting of conditionals and loops not handled or handled poorly u Ad hoc optimizations F Not part of a complete synthesis system u Limited analysis of logic and control costs

11 Copyright CECS & The Spark Project The Spark High-Level Synthesis Methodology  Developed a set of speculative code motions along with supporting transformations  Implemented in a comprehensive synthesis framework u Input: Behavioral description in ANSI-C u Output: Synthesizable register-transfer level VHDL  Quality of results measured in terms of u Scheduling results: cycles in longest path u Controller size: number of states in FSM u Logic synthesis results: critical path length,unit area

12 Copyright CECS & The Spark Project The Spark High-Level Synthesis Framework  Experiments performed using two benchmarks:  ADPCM Encode and MPEG-1 Prediction Block

13 Copyright CECS & The Spark Project Improvements of up to 50 % in Number of States in FSM and Cycles on Longest Path due to Code Motions Within Basic Blocks Within BBs, Across Hierarchical Blocks Within BBs, Across Hier Blocks, Speculation Within BBs, Across Hier Blocks, Spec, Early Condition Execution Within BBs, Across Hier Blocks, Spec, Early Cond Exec, Conditional Speculation Allowed Code Motions Conditional Speculation: Leads to between 10 to 30 % Improvements

14 Copyright CECS & The Spark Project Synthesis Results using Synopsys Design Compiler Within Basic Blocks Within BBs, Across Hierarchical Blocks, Speculation Within BBs, Across Hier Blocks, Spec, Early Condition Execution Within BBs, Across Hier Blocks, Spec, Early Cond Exec, Conditional Speculation Allowed Code Motions  Conditional Speculation leads to u Reduced circuit delays: between 7 to 35 % u Increased Area: between 4 to 8 % F Area figures are high in absolute terms

15 Copyright CECS & The Spark Project Increasing sizes of steering logic and associated control logic  Code motions lead to u Higher Resource Sharing and Utilization u Larger Multiplexors u Larger Control Circuits Control Logic ALU ++

16 Copyright CECS & The Spark Project Increasing sizes of steering logic and associated control logic Control Logic ALU +++  Code motions lead to u Higher Resource Sharing and Utilization u Larger Multiplexors u Larger Control Circuits

17 Copyright CECS & The Spark Project Increasing sizes of steering logic and associated control logic Control Logic ALU ++++  Code motions lead to u Higher Resource Sharing and Utilization u Larger Multiplexors u Larger Control Circuits

18 Copyright CECS & The Spark Project Interconnect minimization by resource binding  Minimize the complexity of steering logic u Multiplexors and demultiplexors  Bind operations with same inputs or outputs to same functional units  Bind variables, which are inputs/outputs to same functional units, to the same registers  Both of these binding problems have been formulated as network flow problems

19 Copyright CECS & The Spark Project Reduction in Area by Interconnect Minimizing Resource Binding Critical Path Total Delay Unit Area Critical Path Total Delay Unit Area Naïve Resource Binding Interconnect Minimizing Resource Binding  Reductions in area of between %  Fairly constant critical path lengths and circuit delay

20 Copyright CECS & The Spark Project Conclusions  Synthesis results after code motions u Considerable gain in execution cycles and controller size u Large Area costs due to interconnect (multiplexors)  Interconnect minimizing resource binding leads to significant area reductions  Benchmarks used are large real-life applications Future Work:  Develop better cost models for code motions u Consider effects on interconnect while scheduling u Create a notion of global cost of the design

21 Copyright CECS & The Spark Project Thank you ! Please do drop by during the Poster session

08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine Conditional Speculation and its Effects on Performance and Area for High-Level Synthesis Sumit GuptaNikil Dutt Nick Savoiu Rajesh Gupta Alex Nicolau SPARK High Level Synthesis System Supported by Semiconductor Research Corporation

23 Copyright CECS & The Spark Project Additional Slides

24 Copyright CECS & The Spark Project Reducing Interconnections by Improved Operation Binding 1:a=b+c ALU 1 2:d=e+f ALU 2 4:h=a+c 3:g=e+d b,d,hf,a,g c e 3:g=e+d; 4:h=a+c 1:a=b+c; 2:d=e+f

25 Copyright CECS & The Spark Project Reduced Interconnections after Operation Binding 1:a=b+c ALU 1 2:d=e+f ALU 2 4:h=a+c 3:g=e+d b,d,gf,a,h c e

26 Copyright CECS & The Spark Project Reducing Interconnections by Improved Variable Binding 1:a=b+c ALU 1 2:d=e+f ALU 2 4:h=a+c 3:g=e+d b,d,gf,a,h c e

27 Copyright CECS & The Spark Project Reduced Interconnections due to Improved Resource Binding 1:a=b+c ALU 1 2:d=e+f ALU 2 4:h=a+c 3:g=e+d b,a,hf,d,g c e

28 Copyright CECS & The Spark Project Improvements of up to 50 % in Number of States in FSM and Cycles on Longest Path due to Code Motions Within Basic Blocks Within BBs, Across Hierarchical Blocks Within BBs, Across Hier Blocks, Speculation Within BBs, Across Hier Blocks, Spec, Early Condition Execution Within BBs, Across Hier Blocks, Spec, Early Cond Exec, Conditional Speculation Allowed Code Motions

29 Copyright CECS & The Spark Project Synthesis Results using Synopsys Design Compiler Within Basic Blocks Within BBs, Across Hierarchical Blocks, Speculation Within BBs, Across Hier Blocks, Spec, Early Condition Execution Within BBs, Across Hier Blocks, Spec, Early Cond Exec, Conditional Speculation Allowed Code Motions