Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse-Grain and Fine-Grain Optimizations.

Slides:

Advertisements

Similar presentations

ECE 667 Synthesis and Verification of Digital Circuits

Advertisements

ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.

Course Outline Traditional Static Program Analysis Software Testing

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.

ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)

Computer Architecture Lecture 7 Compiler Considerations and Optimizations.

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.

Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-

Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.

Modern VLSI Design 2e: Chapter 8 Copyright  1998 Prentice Hall PTR Topics n High-level synthesis. n Architectures for low power. n Testability and architecture.

Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &

08/31/2001Copyright CECS & The Spark Project SPARK High Level Synthesis System Sumit GuptaTimothy KamMichael KishinevskyShai Rotem Nick SavoiuNikil DuttRajesh.

High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department

FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.

Design Automation of Co-Processors for Application Specific Instruction Set Processors Seng Lin Shee.

Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.

08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine Conditional.

TM Pro64™: Performance Compilers For IA-64™ Jim Dehnert Principal Engineer 5 June 2000.

Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.

Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.

A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.

1/20 Data Communication Estimation and Reduction for Reconfigurable Systems Adam Kaplan Philip Brisk Ryan Kastner Computer Science Elec. and Computer Engineering.

Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse Grain and Fine Grain Optimizations.

08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine High-Level.

Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s

Center for Embedded Computer Systems Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive.

Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.

Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A C-to-VHDL Parallelizing High-Level.

Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.

Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.

Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse-Grain and Fine-Grain Optimizations.

Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.

Center for Embedded Computer Systems University of California, Irvine and San Diego Hardware and Interface Synthesis of.

Center for Embedded Computer Systems University of California, Irvine SPARK: A High-Level Synthesis Framework for Applying.

Center for Embedded Computer Systems University of California, Irvine Dynamic Common Sub-Expression Elimination during Scheduling.

Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.

SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta

ECE 699: Lecture 2 ZYNQ Design Flow.

DAC 2001: Paper 18.2 Center for Embedded Computer Systems, UC Irvine Center for Embedded Computer Systems University of California, Irvine

Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.

Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.

Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.

Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.

Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology High-level Specification and Efficient Implementation.

Generic Software Pipelining at the Assembly Level Markus Pister

Optimization software for apeNEXT Max Lukyanov,  apeNEXT : a VLIW architecture  Optimization basics  Software optimizer for apeNEXT  Current.

A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.

Automated Design of Custom Architecture Tulika Mitra

Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.

1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.

Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.

SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.

CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.

1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.

CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.

MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.

An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon.

CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.

L9 : Low Power DSP Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab.

Code Optimization Overview and Examples

High-level optimization Jakub Yaghob

Code Optimization.

CGRA Express: Accelerating Execution using Dynamic Operation Fusion

CSCI1600: Embedded and Real Time Software

ECE 699: Lecture 3 ZYNQ Design Flow.

Architectural-Level Synthesis

HIGH LEVEL SYNTHESIS.

Dynamic Hardware Prediction

ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL code ECE 448 – FPGA and ASIC Design.

CSCI1600: Embedded and Real Time Software

Presentation transcript:

Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse-Grain and Fine-Grain Optimizations for High-Level Synthesis Topic Defense Sumit Gupta

High Level Synthesis M e m o r y ALU Control Data path d = e - fg = h + i If Node TF c x = a + b c = a < b j = d x g l = e + x x = a + b; c = a < b; if (c) then d = e – f; else g = h + i; j = d x g; l = e + x; Transform behavioral descriptions to RTL/gate level From C to CDFG to Architecture

High-level Synthesis Well-researched area: from early 1980’s – so what’s new ? Well-researched area: from early 1980’s – so what’s new ? Level of design entry has moved up from schematic entry to coding in hardware description languages (VHDL, Verliog, C) Level of design entry has moved up from schematic entry to coding in hardware description languages (VHDL, Verliog, C) No comprehensive synthesis framework No comprehensive synthesis framework Few and scattered optimizations: mostly algebraic and at operation level of granularity Few and scattered optimizations: mostly algebraic and at operation level of granularity Results presented for scheduling Results presented for scheduling Effects on logic synthesis not understood Effects on logic synthesis not understood Small, synthetic benchmarks: primarily data-intensive DSP algorithms Small, synthetic benchmarks: primarily data-intensive DSP algorithms Quality of synthesis results severely effected by complex control flow Quality of synthesis results severely effected by complex control flow Nested ifs and loops not handled or handled poorly Nested ifs and loops not handled or handled poorly Poor understanding of the interaction between source- level and fine grain “compiler” transformations Poor understanding of the interaction between source- level and fine grain “compiler” transformations

Focus of this Work Target Applications: Target Applications: Behavioral descriptions with complex and nested conditionals and loops; for example: Behavioral descriptions with complex and nested conditionals and loops; for example: mixed data and control-intensive multimedia and image processing applications mixed data and control-intensive multimedia and image processing applications control-intensive microprocessor blocks: resource rich, few highly packed cycles. control-intensive microprocessor blocks: resource rich, few highly packed cycles. Objectives: Objectives: Improve quality of HLS results by concurrency enhancement Improve quality of HLS results by concurrency enhancement Improve controllability of the HLS solutions Improve controllability of the HLS solutions

Characteristics of Target Applications Moderately Control-intensive behaviors Moderately Control-intensive behaviors Operations that execute under conditions Operations that execute under conditions Entire behaviors within nested loops Entire behaviors within nested loops Programming styles significantly effect quality of results: Programming styles significantly effect quality of results: Placement of operations and control-flow Placement of operations and control-flow Choice of control flow: Nesting of ifs and loops Choice of control flow: Nesting of ifs and loops A need for high-level and compiler transformations A need for high-level and compiler transformations To overcome the variance due to programming style To overcome the variance due to programming style Increase resource utilization in the presence of conditionals Increase resource utilization in the presence of conditionals Exploit mutual exclusivity of operations to enhance resource sharing Exploit mutual exclusivity of operations to enhance resource sharing Maximally Parallelize Operations under given Resource Constraints Maximally Parallelize Operations under given Resource Constraints

Recent Related Work Code motions in the presence of conditionals Code motions in the presence of conditionals Condition Vector List Scheduling [Wakabayashi 89] Condition Vector List Scheduling [Wakabayashi 89] Symbolic Scheduling [Radivojevic 96] Symbolic Scheduling [Radivojevic 96] WaveSched Scheduler [Lakshminarayana 98] WaveSched Scheduler [Lakshminarayana 98] Basic Block Control Graph Scheduling [Santos 99] Basic Block Control Graph Scheduling [Santos 99] Limitations Limitations Arbitrary nesting of conditionals and loops not handled or handled poorly Arbitrary nesting of conditionals and loops not handled or handled poorly Ad hoc optimizations Ad hoc optimizations Not part of a complete synthesis system Not part of a complete synthesis system Limited analysis of logic and control costs Limited analysis of logic and control costs

Parallelizing Compiler Background Scheduling for increasing instruction-level parallelism Scheduling for increasing instruction-level parallelism Percolation Scheduling Percolation Scheduling Can produce optimal schedule given enough resources Can produce optimal schedule given enough resources Trailblazing Trailblazing Hierarchical Code Motion Technique Hierarchical Code Motion Technique Trace Scheduling, Superblock and Hyperblock Scheduling Trace Scheduling, Superblock and Hyperblock Scheduling Loop Transformations Loop Transformations Loop Invariant Code Motion Loop Invariant Code Motion Loop Pipelining Loop Pipelining Induction Variable Analysis Induction Variable Analysis Loop fusion, interchange, distribution Loop fusion, interchange, distribution Partial evaluation Partial evaluation CSE, Copy Propagation, Constant Folding CSE, Copy Propagation, Constant Folding

In the Context of High-Level Synthesis Cost Models are different Cost Models are different Operation and Resource Models Operation and Resource Models Non-sequential designs Non-sequential designs Transformations have implications on hardware Transformations have implications on hardware Non-trivial control costs Non-trivial control costs Operation duplication leads to flexible scheduling ; however, can lead to higher control costs Operation duplication leads to flexible scheduling ; however, can lead to higher control costs Mutual exclusivity of operations Mutual exclusivity of operations Resource Sharing Resource Sharing

Coarse and Fine-Grain Code Optimizations Beyond Basic Block Code Motions Beyond Basic Block Code Motions Speculation Speculation Reverse Speculation Reverse Speculation Early Condition Execution Early Condition Execution Conditional Speculation Conditional Speculation Dynamic Common Sub-expression Elimination Dynamic Common Sub-expression Elimination Loop Unrolling Loop Unrolling Loop Index Variable Elimination Loop Index Variable Elimination Chaining Operations across Conditionals Chaining Operations across Conditionals

Concurrency Enhancement by Code Motions + + If Node TF TF ++ Reverse Speculation Conditional Speculation __ + Across Hierarchical Blocks _ _ a b c Hierarchical Task Graph Representation of Control-Data Flow Graph Resource Utilization

Concurrency Enhancement by Code Motions + + If Node TF TF ++ Reverse Speculation Conditional Speculation __ + Across Hierarchical Blocks _ _ a b c Hierarchical Task Graph Representation of Control-Data Flow Graph Resource Utilization   Leads to Higher Resource Utilization   Shorter Schedule Lengths   Leads to Higher Resource Utilization   Shorter Schedule Lengths

Scheduling Heuristic BB 1BB 2 BB 0 BB 5BB 6 BB 4 BB 3 BB Speculate c b d + Across HTG Across HTG Speculate Across HTG + a Get Available Ops Get Available Ops a, b, c, d a, b, c, d Determine Code Motions Required Determine Code Motions Required Assign Cost to each Operation Assign Cost to each Operation Schedule Op with lowest Cost Schedule Op with lowest Cost

BB 1BB 2 BB 0 BB 5BB 6 BB 4 BB 3 BB c b + a + d Scheduling Heuristic BB 1BB 2 BB 0 BB 5BB 6 BB 4 BB 3 BB c b d + + Across HTG Conditional Speculation + a + d

Dynamic Common Sub-expression Elimination BB 1BB 2 BB 0 a = b + c BB 5BB 6 BB 4 d = b + c BB 3 BB 7 Speculate BB 1BB 2 BB 0 a = dcse BB 5BB 6 BB 4 d = dcse BB 3 BB 7 dcse = b + c

Interconnect minimization by resource binding Minimize the complexity of steering logic Minimize the complexity of steering logic Multiplexors and demultiplexors Multiplexors and demultiplexors Introduce additional interconnect constraints/costs during resource binding Introduce additional interconnect constraints/costs during resource binding Operation and Variable binding have been formulated as network flow problems Operation and Variable binding have been formulated as network flow problems

Operation Binding + a b c + e b f ALU ea cf b Bind Operations with the same inputs or outputs to the same functional unit

Variable Binding ALU ea cf b Bind Variables that are inputs or outputs to same functional unit to the same registers

Variable Binding ALU ea cf b Bind Variables that are inputs or outputs to same functional unit to the same registers

Implementation SPARK High Level Synthesis Framework

Experimental Setup Benchmarks derived from several industrial designs Benchmarks derived from several industrial designs MPEG-1 Prediction Block MPEG-1 Prediction Block ADPCM Encoder ADPCM Encoder Several image processing passes from GIMP software Several image processing passes from GIMP software Synthesized using Spark Synthesized using Spark Number of States in FSM Number of States in FSM Cycles on Longest Path in Design Cycles on Longest Path in Design RTL VHDL from Spark synthesized using Synopsys RTL VHDL from Spark synthesized using Synopsys Critical Path Length (ns) => dictates Clock Period Critical Path Length (ns) => dictates Clock Period Unit Area (in terms of synthesis library used) Unit Area (in terms of synthesis library used)

HLS Results for Code Motions Within Basic Blocks Within BBs, Across Hierarchical Blocks Within BBs, Across Hier Blocks, Speculation Within BBs, Across Hier Blocks, Speculation, Early Condition Execution Within BBs, Across Hier Blocks, Speculation, Early Cond Exec, Conditional Speculation Allowed Code Motions Overall Performance gains of up to 50 % in controller size and longest path cycles Number of States In FSM Controller Cycles on Longest Path through Design

Logic Synthesis Results for Code Motions Within Basic Blocks Within BBs, Across Hierarchical Blocks, Speculation Within BBs, Across Hier Blocks, Speculation, Early Condition Execution Within BBs, Across Hier Blocks, Speculation, Early Cond Exec, Conditional Speculation Allowed Code Motions Enabling all code motions leads to Enabling all code motions leads to Reduced Circuit Delays: upto 50 % Reduced Circuit Delays: upto 50 % Increased Area/interconnect costs: Increased Area/interconnect costs: Reduced by interconnect aware resource binding Reduced by interconnect aware resource binding Enabling all code motions leads to Enabling all code motions leads to Reduced Circuit Delays: upto 50 % Reduced Circuit Delays: upto 50 % Increased Area/interconnect costs: Increased Area/interconnect costs: Reduced by interconnect aware resource binding Reduced by interconnect aware resource binding

Critical Path Total Delay Unit Area Critical Path Total Delay Unit Area Naïve Resource Binding Interconnect Minimizing Resource Binding Reductions in area of between % Fairly constant critical path lengths and circuit delay Reductions in area of between % Fairly constant critical path lengths and circuit delay Results after Interconnect Minimization

Synthesis Results with Dynamic CSE No CSE With CSE With Dynamic CSE With CSE & Dynamic CSE

DCSE Synthesis Results: Pred0 No CSE With CSE With Dynamic CSE With CSE & Dynamic CSE Delays reduce by up to 40 % Area reduces by up to 35 % Register Usage Reduces ! Delays reduce by up to 40 % Area reduces by up to 35 % Register Usage Reduces !

Priority-based List Scheduling Heuristic Priority-based List Scheduling Heuristic Allows control of Code Motions employed Allows control of Code Motions employed Dynamic application of CSE and Copy Propagation Dynamic application of CSE and Copy Propagation Summary of Work Done Speculative Code Motions Speculative Code Motions Code Motion Techniques Code Motion Techniques Trailblazing Trailblazing Compiler Passes Compiler Passes Copy & Constant Propagation Copy & Constant Propagation Dead Code Elimination Dead Code Elimination Common SubExpression Elimination Common SubExpression Elimination Dynamic Renaming Dynamic Renaming Loop Unrolling Loop Unrolling Loop Index Variable Elimination Loop Index Variable Elimination Chaining across Conditional blocks Interconnect Minimizing Resource Binding Interconnect Minimizing Resource Binding FSM Generation FSM Generation Non-trivial in the presence of chaining across conditionals and multi-cycle operations Non-trivial in the presence of chaining across conditionals and multi-cycle operations VHDL Generation VHDL Generation

Future Directions Interactive GUI: ability to Interactive GUI: ability to Specify scheduling decisions Specify scheduling decisions Timing Constraints Timing Constraints Loop Pipelining Heurisitic Loop Pipelining Heurisitic Loop Transformations Loop Transformations Loop Fusion Effects of Code Motions on Power Effects of Code Motions on Power Ability to model Complex Resources Ability to model Complex Resources Pipelined Resources Pipelined Resources Loop Pipelining Heurisitic Loop Pipelining Heurisitic Loop Transformations Loop Transformations Loop Fusion Loop Fusion Analysis of Effects of Code Motions on Power Analysis of Effects of Code Motions on Power More Transformations targeting Microprocessor Functional Blocks More Transformations targeting Microprocessor Functional Blocks Loop Invariant Code Motion Loop Invariant Code Motion

Thank You

Publications Dynamic Common Sub-Expression Elimination during Scheduling in High-Level Synthesis S. Gupta, M. Reshadi, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, To appear in the International Symposium on System Synthesis, October 2002 Dynamic Common Sub-Expression Elimination during Scheduling in High-Level Synthesis S. Gupta, M. Reshadi, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, To appear in the International Symposium on System Synthesis, October 2002 Coordinated Transformations for High-Level Synthesis of High Performance Microprocessor Blocks S. Gupta, T. Kam, M. Kishinevsky, S. Rotem, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, Design Automation Conference, June 2002 Coordinated Transformations for High-Level Synthesis of High Performance Microprocessor Blocks S. Gupta, T. Kam, M. Kishinevsky, S. Rotem, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, Design Automation Conference, June 2002 Conditional Speculation and its Effects on Performance and Area for High-Level Synthesis S. Gupta, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2001 Conditional Speculation and its Effects on Performance and Area for High-Level Synthesis S. Gupta, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2001 Speculation Techniques for High Level synthesis of Control Intensive Designs Speculation Techniques for High Level synthesis of Control Intensive Designs S. Gupta, N. Savoiu, S. Kim, N.D. Dutt, R.K. Gupta, A. Nicolau, DAC 2001 Analysis of High-level Address Code Transformations for Programmable Processors Analysis of High-level Address Code Transformations for Programmable Processors S. Gupta, M. Miranda, F. Catthoor, R. K. Gupta, DATE 2000 Book Chapter: ASIC Design, S. Gupta, R. K. Gupta, ASIC Design, S. Gupta, R. K. Gupta, Chapter 64, The VLSI Handbook, Edited by Wai-Kai Chen, Under Submission to Journal: Using Global Code Motions to Improve the Quality of Results for High-Level Synthesis, Using Global Code Motions to Improve the Quality of Results for High-Level Synthesis, S. Gupta, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, submitted to TCAD

Additional Slides

SPARK Core Strengths Focus on Focus on Transformations that increase amount of parallelism available in the source description Transformations that increase amount of parallelism available in the source description Tightly integrate with parallelizing compiler transformations Tightly integrate with parallelizing compiler transformations Provide a HLS “toolbox” for the micro-architect Provide a HLS “toolbox” for the micro-architect Develop transformations that Develop transformations that Limit effects of control-flow Limit effects of control-flow Generalized code motions Generalized code motions Reduce data dependencies Reduce data dependencies Renaming, loop unrolling, loop index variable elimination Renaming, loop unrolling, loop index variable elimination

SPARK Framework Customizable extensible scheduler Customizable extensible scheduler Range of transformations in modular toolbox Range of transformations in modular toolbox Percolation, trailblazing, loop pipelining (RDLP) Percolation, trailblazing, loop pipelining (RDLP) Selected under heuristics and/or user control Selected under heuristics and/or user control Code motion, loop transformations Code motion, loop transformations Input in C and output to synthesizable RTL VHDL Input in C and output to synthesizable RTL VHDL Flow from architecture design to synthesis Flow from architecture design to synthesis Quality of results measured in terms of Quality of results measured in terms of Scheduling results: cycles in longest path Scheduling results: cycles in longest path Controller size: number of states in FSM Controller size: number of states in FSM Logic synthesis results: critical path length,unit area Logic synthesis results: critical path length,unit area

Summary of Work Done Developed a set of code transformations targeted towards HLS Developed a set of code transformations targeted towards HLS Implemented in a complete high-level synthesis framework Implemented in a complete high-level synthesis framework Implemented supporting compiler passes Implemented supporting compiler passes Produce synthesizable VHDL output from input C Produce synthesizable VHDL output from input C Analyzed effects of transformations on final logic synthesis results Analyzed effects of transformations on final logic synthesis results Applied to moderately complex industrial benchmarks Applied to moderately complex industrial benchmarks

Ongoing Work Loop Transformations Loop Transformations Loop Invariant Code Motion Loop Invariant Code Motion Loop Pipelining Heuristics Loop Pipelining Heuristics Loop Fusion Loop Fusion High-level Power analysis of transformations High-level Power analysis of transformations Can Power consumption be reduced despite increased resource utilization Can Power consumption be reduced despite increased resource utilization

BB 1BB 2 BB 0 BB 5BB 6 BB 4 BB 3 BB c b d + a Scheduler Heuristic BB 1BB 2 BB 0 BB 5BB 6 BB 4 BB 3 BB Speculate c a b d + Across HTG Across HTG Speculate Across HTG + Across HTG Conditional Speculation + a1a1 + a2a2 Reverse Speculate