Center for Embedded Computer Systems Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive.

Slides:



Advertisements
Similar presentations
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Advertisements

ECE 667 Synthesis and Verification of Digital Circuits
1 Optimization Optimization = transformation that improves the performance of the target code Optimization must not change the output must not cause errors.
Traveling Salesperson Problem
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling John Cavazos University.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Loop Unrolling & Predication CSE 820. Michigan State University Computer Science and Engineering Software Pipelining With software pipelining a reorganized.
High Level Languages: A Comparison By Joel Best. 2 Sources The Challenges of Synthesizing Hardware from C-Like Languages  by Stephen A. Edwards High-Level.
EECC551 - Shaaban #1 Fall 2005 lec# Static Compiler Optimization Techniques We examined the following static ISA/compiler techniques aimed.
Optimal Instruction Scheduling for Multi-Issue Processors using Constraint Programming Abid M. Malik and Peter van Beek David R. Cheriton School of Computer.
Program Representations. Representing programs Goals.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Modern VLSI Design 2e: Chapter 8 Copyright  1998 Prentice Hall PTR Topics n High-level synthesis. n Architectures for low power. n Testability and architecture.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
08/31/2001Copyright CECS & The Spark Project SPARK High Level Synthesis System Sumit GuptaTimothy KamMichael KishinevskyShai Rotem Nick SavoiuNikil DuttRajesh.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine Conditional.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse-Grain and Fine-Grain Optimizations.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Addressing Optimization for Loop Execution Targeting DSP with Auto-Increment/Decrement Architecture Wei-Kai Cheng Youn-Long Lin* Computer & Communications.
A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse Grain and Fine Grain Optimizations.
08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine High-Level.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A C-to-VHDL Parallelizing High-Level.
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
VHDL Coding Exercise 4: FIR Filter. Where to start? AlgorithmArchitecture RTL- Block diagram VHDL-Code Designspace Exploration Feedback Optimization.
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse-Grain and Fine-Grain Optimizations.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Center for Embedded Computer Systems University of California, Irvine and San Diego Hardware and Interface Synthesis of.
Center for Embedded Computer Systems University of California, Irvine SPARK: A High-Level Synthesis Framework for Applying.
Center for Embedded Computer Systems University of California, Irvine Dynamic Common Sub-Expression Elimination during Scheduling.
Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.
SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta
DAC 2001: Paper 18.2 Center for Embedded Computer Systems, UC Irvine Center for Embedded Computer Systems University of California, Irvine
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Precision Going back to constant prop, in what cases would we lose precision?
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Array Synthesis in SystemC Hardware Compilation Authors: J. Ditmar and S. McKeever Oxford University Computing Laboratory, UK Conference: Field Programmable.
A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
Limits of Instruction-Level Parallelism Presentation by: Robert Duckles CSE 520 Paper being presented: Limits of Instruction-Level Parallelism David W.
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
Optimal Superblock Scheduling Using Enumeration Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
Area-Efficient Instruction Set Synthesis for Reconfigurable System on Chip Designs Philip BriskAdam KaplanMajid Sarrafzadeh Embedded and Reconfigurable.
Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.
Code Optimization.
Mechanical Certification of Loop Pipelining Transformations: A Preview
Instruction Scheduling for Instruction-Level Parallelism
CSCI1600: Embedded and Real Time Software
Ann Gordon-Ross and Frank Vahid*
Construction Engineering Department Construction Project with Resources Constraints By. M. Chelaka, D. Greenwood & E. Johansen, /9/2019.
Dynamic Hardware Prediction
How to improve (decrease) CPI
Loop-Level Parallelism
CSCI1600: Embedded and Real Time Software
Presentation transcript:

Center for Embedded Computer Systems Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive Designs Supported by Semiconductor Research Corporation 1 School of Information and Computer Science University of California, Irvine 2 Department of Computer Science And Engineering University of California, San Diego Sumit Gupta 1 Nikil Dutt 1 Rajesh Gupta 2 Alex Nicolau 1

2 High Level Synthesis: From Behavior to Hardware M e m o r y ALU Control Data path d = e - fg = h + i If Node TF c x = a + b c = a < b j = d x g l = e + x x = a + b; c = a < b; if (c) then d = e – f; else g = h + i; j = d x g; l = e + x;  Our approach targets descriptions with nested conditionals and loops

3 Synthesizing Control-Intensive Designs Programming style and control constructs have tremendous impact on the quality of HLS results Programming style and control constructs have tremendous impact on the quality of HLS results Operation placement is for programming convenience: not optimized for synthesis Operation placement is for programming convenience: not optimized for synthesis Restructure and duplicate code using Parallelizing Compiler Transformations: Speculative Code Motions Restructure and duplicate code using Parallelizing Compiler Transformations: Speculative Code Motions We present heuristics that carefully guide and increase the scope of the speculative code motions – particularly operation duplication (Conditional Speculation ) We present heuristics that carefully guide and increase the scope of the speculative code motions – particularly operation duplication (Conditional Speculation )

4 Toolbox Approach to Scheduling Scheduling Code Motion Dynamic CSE Loop Transformations Percolation/Trailblazing Speculative Code Motions CSE/IVA/Copy Prop Operation Chaining Loop Transformations Heuristics Transformations Toolbox Scheduling Framework Scheduling Heuristics employ Code Transformations from Transformations Toolbox Scheduling Heuristics employ Code Transformations from Transformations Toolbox

5 Scheduling using Speculative Code Motions BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB c b d + + a BB 0 BB 9 Speculate Across If Block Speculate BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB c b + a BB 0 BB 9 + d + Resource Allocation+

6 BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB c b + a BB 0 BB 9 + d Scheduling using Speculative Code Motions BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB c b d + + Across If Block Conditional Speculation + a + d BB 0 BB 9 + d + Resource Allocation+   Conditional Speculation   Duplicates Operations into the branches of a Conditional Block

7 Increasing the Scope of Code Motions If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d S0 S1 S2 S3 + Resource Allocation Original Design If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d Scheduled Design Unbalanced Conditional Longest Path A is a set of concurrent operations inside a basic block A scheduling step is a set of concurrent operations inside a basic block A basic block is a sequence of scheduling steps with no control branches/merges between them A basic block is a sequence of scheduling steps with no control branches/merges between them

8 Insert New Scheduling Step in Shorter Branch If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d S0 S1 S2 S3 + Resource Allocation Original DesignScheduled Design

9 Insert New Scheduling Step in Shorter Branch If Node TF BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d S0 S1 S2 S3 + Resource Allocation e __ e Original DesignScheduled Design Insert scheduling steps into shorter conditional branch Insert scheduling steps into shorter conditional branch Enables further code compaction Enables further code compaction

10 Organization of Scheduling Heuristics Scheduling Heuristic Candidate Mover Candidate Provider IR Walker Traverses Design to find next basic block to schedule Traverses Design to find Candidate Operations to schedule Chooses one of the Candidate Operations to Schedule Moves, duplicates and schedules chosen Operation Scheduler

11 Organization of Scheduling Heuristics Scheduling Heuristic Candidate Mover Candidate Provider IR Walker Scheduler BBDT Branch Balancing During Traversal BBDCM Branch Balancing During Code Motion Check if BBDCM will Enable Code Motion

12 BBDT: Get Next Step to Schedule Schedule Design starting from first basic block in Design Schedule Design starting from first basic block in Design On each call, returns next step in current basic block On each call, returns next step in current basic block If last step in basic block is reached If last step in basic block is reached If current BB is a Branch of a Conditional C If basic blocks in other branches of C are scheduled and have more scheduling steps, Insert new step in currBB Traverse design and get next basic block Traverse design and get next basic block Return first step from next basic block Return first step from next basic block If Node TF _ f BB 1 BB 3 BB 2 BB 4 BB 5 + b + d _ c BB 0 _ e _ a

13 BBDT: Insert Scheduling Steps while Getting Next Step to Schedule Schedule Design starting from first basic block in Design Schedule Design starting from first basic block in Design On each call, returns next step in current basic block On each call, returns next step in current basic block If last step in basic block is reached If last step in basic block is reached If current BB is a Branch of a Conditional C If basic blocks in other branches of C are scheduled and have more scheduling steps, Insert new step in currBB Traverse design and get next basic block Traverse design and get next basic block Return first step from next basic block Return first step from next basic block If Node TF _ f BB 1 BB 3 BB 2 BB 4 BB 5 + b + d _ c _ a BB 0 _ e Unbalanced Conditional

14 BBDT: Insert Scheduling Steps while Getting Next Step to Schedule If Node TF _ f BB 1 BB 3 BB 2 BB 4 BB 5 + b + d _ c _ a BB 0 _ e Schedule Design starting from first basic block in Design Schedule Design starting from first basic block in Design On each call, returns next step in current basic block On each call, returns next step in current basic block If last step in basic block is reached If last step in basic block is reached If current BB is a Branch of a Conditional C If basic blocks in other branches of C are scheduled and have more scheduling steps, Insert new step in currBB Traverse design and get next basic block Traverse design and get next basic block Return first step from next basic block Return first step from next basic block

15 Scope of BBDT If Node 2 TF f BB 2 BB 4 BB 3 BB 5 BB 7 + c + _ d BB 6 b _ a If Node 1 TF BB 0 BB Resource Constraints Scheduling order: BB1, BB3 and BB4 After scheduling the step in BB4 BBDT adds one new step in BB3 & BB4 since number of steps in BB1 is larger Scheduled Being Scheduled

16 Scope of BBDT If Node 2 TF f BB 2 BB 4 BB 3 BB 5 + c + _ d BB 6 b _ a If Node 1 TF BB 0 BB 1 + Scheduling order: Scheduling order: BB1, BB3 and BB4 BB1, BB3 and BB4 After scheduling the step in BB4 After scheduling the step in BB4 BBDT adds one new step in BB3 & BB4 since number of steps in BB1 is larger BBDT adds one new step in BB3 & BB4 since number of steps in BB1 is larger The new step in BB4 is now scheduled The new step in BB4 is now scheduled Operation f can be duplicated up into BB1, BB3 and BB4 Operation f can be duplicated up into BB1, BB3 and BB4 BB 7 + Resource Constraints Scheduled Being Scheduled

17 Scope of BBDT Inserts steps (Balances Branches) after scheduling last branch of conditional Inserts steps (Balances Branches) after scheduling last branch of conditional Needs all other branches to be scheduled already Needs all other branches to be scheduled already To get an accurate picture of the number of steps and the resource utilization in all the conditional branches To get an accurate picture of the number of steps and the resource utilization in all the conditional branches Continues to schedule the new scheduling step in the last branch Continues to schedule the new scheduling step in the last branch

18 Limitations of BBDT If Node 2 TF f BB 2 BB 4 BB 3 BB 5 + c + _ d BB 6 b _ a If Node 1 TF BB 0 BB 1 + _ e Scheduling order: Scheduling order: BB1, BB3 and BB4 BB1, BB3 and BB4 BBDT can add new steps/balance branches After scheduling last branch in conditional BBDT can add new steps/balance branches After scheduling last branch in conditional BBDT adds new step in BB3 after scheduling BB4, so we cannot do CS of operation f BBDT adds new step in BB3 after scheduling BB4, so we cannot do CS of operation f BB 7 + Resource Constraints Scheduled Being Scheduled

19 Limitations of BBDT If Node 2 TF f BB 2 BB 4 BB 3 BB 5 + c + _ d BB 6 b _ a If Node 1 TF BB 0 BB 1 + _ e Scheduling order: Scheduling order: BB1, BB3 and BB4 BB1, BB3 and BB4 BBDT can add new steps/balance branches After scheduling last branch in conditional BBDT can add new steps/balance branches After scheduling last branch in conditional BBDT adds new step in BB3 after scheduling BB4, so we cannot do CS of operation f BBDT adds new step in BB3 after scheduling BB4, so we cannot do CS of operation f BB 7 + Resource Constraints Scheduled Being Scheduled No backtracking/re-scheduling of basic blocks/branches that have already been scheduled

20 BBDCM: Branch Balancing During Code Motion Operates while checking if an operation can be conditionally speculated into the scheduling step under consideration Operates while checking if an operation can be conditionally speculated into the scheduling step under consideration Checks resource utilization of all the basic blocks that the operation will be duplicated in Checks resource utilization of all the basic blocks that the operation will be duplicated in Tries to Find Idle Resources in all branches of the conditional Tries to Find Idle Resources in all branches of the conditional Inserts new step whenever possible if there is no idle resource: This is the Modification to Balance Branches Inserts new step whenever possible if there is no idle resource: This is the Modification to Balance Branches

21 Idle Resources A resource is said to be Idle in a scheduling step if There is no operation scheduled on it in that step There is no operation scheduled on it in that step For multi-cycle resources For multi-cycle resources if there is no operation scheduled on previous step(s) if there is no operation scheduled on previous step(s) if there is no operation scheduled on next step(s) if there is no operation scheduled on next step(s) If Node TF BB 1 BB 3 BB 2 BB 4 BB 5 BB 0 _ a + d _ e 2 Cycle Multiplier Previous Step Next Step X

22 BBDCM: Allow Conditional Speculation ? If Node 2 TF f BB 2 BB 4 BB 3 BB 5 BB 7 + c + e _ _ d BB 6 b _ a If Node 1 TF BB 0 BB 1 + While considering CS in last conditional BB While considering CS in last conditional BB Find Idle Resources in each basic block to duplicate in Find Idle Resources in each basic block to duplicate in If no idle resource in any BB If no idle resource in any BB If number of steps in BB ≤ number of steps in any other BB If number of steps in BB ≤ number of steps in any other BB Insert New scheduling step Insert New scheduling step + +Allow CS ? BB 7 + Resource Constraints Scheduled Being Scheduled

23 BBDCM: Allow Conditional Speculation ? If Node 2 TF f BB 2 BB 4 BB 3 BB 5 BB 7 + c + e _ _ d BB 6 b _ a If Node 1 TF BB 0 BB 1 + While considering CS in last conditional BB While considering CS in last conditional BB Find Idle Resources in each basic block to duplicate in Find Idle Resources in each basic block to duplicate in If no idle resource in any BB If no idle resource in any BB If number of steps in BB ≤ number of steps in any other BB If number of steps in BB ≤ number of steps in any other BB Insert New scheduling step Insert New scheduling step + ++ BB 7 + Resource Constraints Scheduled Being Scheduled Allow CS ?

24 BBDCM: Allow Conditional Speculation ? If Node 2 TF f BB 2 BB 4 BB 3 BB 5 BB 7 + c + e _ _ d BB 6 b _ a If Node 1 TF BB 0 BB 1 + While considering CS in last conditional BB While considering CS in last conditional BB Find Idle Resources in each basic block to duplicate in Find Idle Resources in each basic block to duplicate in If no idle resource in any BB If no idle resource in any BB If number of steps in BB ≤ number of steps in any other BB If number of steps in BB ≤ number of steps in any other BB Insert New scheduling step Insert New scheduling step ++ f f BB 7 + Resource Constraints Scheduled Being Scheduled BBDCM inserts new scheduling steps while applying code motions  if it enables Conditional Speculation BBDCM inserts new scheduling steps while applying code motions  if it enables Conditional Speculation

25 Implementation: SPARK High Level Synthesis Framework C Input => RTL VHDL Output VHDL => Logic Synthesis Results Customizable Scheduler Modular toolbox of transformations Heuristics select transformations Branch Balancing Algorithms Integrate with Scheduling Heursitics

26 Experiments: Target Applications Design # of Ifs # of Loops # Non-Empty Basic Blocks # of Operations MPEG-1 pred MPEG-1 pred GIMPtiler MPEG-1 Prediction Block MPEG-1 Prediction Block GIMP Image Processing software GIMP Image Processing software

27 All Code Motions except CS + Conditional Spec (CS) CS+BBDT: Add Steps during Traversal CS+BBDCM: Add steps during CS All Code Motions+CS+BBDT+BBDCM Experimental Results Effectiveness of Conditional Speculation is limited without Branch Balancing Algorithms All Code Motions except CS + Conditional Spec (CS) CS+Algo 1: Add Steps during Scheduling CS+Algo 2: Insert steps during CS All Code Motions+CS+Algo 1+Algo 2 Inserting Scheduling Steps during Traversal (BBDT) and while applying Conditional Speculation (BBDCM) improves Scheduling Results by %

28 Conclusions Two Branch Balancing Techniques to Increase the Effectiveness of Code Motions – specifically Conditional Speculation Two Branch Balancing Techniques to Increase the Effectiveness of Code Motions – specifically Conditional Speculation Manage resource utilization of multiple basic blocks Manage resource utilization of multiple basic blocks Insert Scheduling Steps in Unbalanced Conditional Branches dynamically during scheduling Insert Scheduling Steps in Unbalanced Conditional Branches dynamically during scheduling Implemented in comprehensive High-Level Synthesis framework: Synthesizes Behavioral C to RTL VHDL Implemented in comprehensive High-Level Synthesis framework: Synthesizes Behavioral C to RTL VHDL Demonstrated effectiveness on large industrial applications Demonstrated effectiveness on large industrial applications With Profiling Information: insert steps in less taken Conditional Branches With Profiling Information: insert steps in less taken Conditional Branches

29 Recent Related Work Scheduling designs with conditionals Scheduling designs with conditionals Condition Vector List Scheduling [Wakabayashi 89] Condition Vector List Scheduling [Wakabayashi 89] Path Based Scheduling [Camposano 91] Path Based Scheduling [Camposano 91] Symbolic Scheduling [Radivojevic 96] Symbolic Scheduling [Radivojevic 96] WaveSched Scheduler [Lakshminarayana 98] WaveSched Scheduler [Lakshminarayana 98] Basic Block Control Graph Scheduling [Santos 99] Basic Block Control Graph Scheduling [Santos 99] Early work was on data-intensive DSP algorithms Early work was on data-intensive DSP algorithms Pipelining, Algorithmic transformations Pipelining, Algorithmic transformations

30 Thank You