Presentation is loading. Please wait.

Presentation is loading. Please wait.

Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A C-to-VHDL Parallelizing High-Level.

Similar presentations


Presentation on theme: "Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A C-to-VHDL Parallelizing High-Level."— Presentation transcript:

1 Center for Embedded Computer Systems University of California, Irvine and San Diego http://www.cecs.uci.edu/~spark SPARK: A C-to-VHDL Parallelizing High-Level Synthesis Framework Supported by Semiconductor Research Corporation & Intel Inc Sumit Gupta Rajesh Gupta, Nikil Dutt, Alex Nicolau

2 Copyright Sumit Gupta 2004 2 System Level Synthesis System Level Model Task Analysis HW/SW Partitioning ASIC Processor Core Memory FPGA I/O Hardware Behavioral Description Software Behavioral Description Software Compiler High Level Synthesis

3 Copyright Sumit Gupta 2004 3 High Level Synthesis M e m o r y ALU Control Data path d = e - fg = h + i If Node TF c x = a + b c = a < b j = d x g l = e + x x = a + b; c = a < b; if (c) then d = e – f; else g = h + i; j = d x g; l = e + x; Transform behavioral descriptions to RTL/gate level From C to CDFG to Architecture Problem # 1 : Poor quality of HLS results beyond straight-line behavioral descriptions Poor/No controllability of the HLS results Problem # 2 :

4 Copyright Sumit Gupta 2004 4 Outline Motivation and Background Motivation and Background Our Approach to Parallelizing High-Level Synthesis Our Approach to Parallelizing High-Level Synthesis Code Transformations Techniques for PHLS Code Transformations Techniques for PHLS Parallelizing Transformations Parallelizing Transformations Dynamic Transformations Dynamic Transformations The PHLS Framework and Experimental Results The PHLS Framework and Experimental Results Multimedia and Image Processing Applications Multimedia and Image Processing Applications Case Study: Intel Instruction Length Decoder Case Study: Intel Instruction Length Decoder Conclusions and Future Work Conclusions and Future Work  Compilation for Coarse-Grain Reconfigurable Archs

5 Copyright Sumit Gupta 2004 5 High-level Synthesis Well-researched area: from early 1980’s Well-researched area: from early 1980’s Renewed interest due to new system level design methodologies Renewed interest due to new system level design methodologies Large number of synthesis optimizations have been proposed Large number of synthesis optimizations have been proposed Either operation level: algebraic transformations on DSP codes Either operation level: algebraic transformations on DSP codes or logic level: Don’t Care based control optimizations or logic level: Don’t Care based control optimizations In contrast, compiler transformations operate at both operation level (fine-grain) and source level (coarse-grain) In contrast, compiler transformations operate at both operation level (fine-grain) and source level (coarse-grain) Parallelizing Compiler Transformations Parallelizing Compiler Transformations Different optimization objectives and cost models than HLS Different optimization objectives and cost models than HLS  Our aim: Develop Synthesis and Parallelizing Compiler Transformations that are “useful” for HLS Beyond scheduling results: in Circuit Area and Delay Beyond scheduling results: in Circuit Area and Delay For large designs with complex control flow (nested conditionals/loops) For large designs with complex control flow (nested conditionals/loops)

6 Copyright Sumit Gupta 2004 6 Our Approach: Parallelizing HLS (PHLS) C Input VHDL Output Original CDFG Optimized CDFG Scheduling & Binding Source-Level Compiler Transformations Scheduling Compiler & Dynamic Transformations Optimizing Compiler and Parallelizing Compiler transformations applied at Source-level (Pre-synthesis) and during Scheduling Optimizing Compiler and Parallelizing Compiler transformations applied at Source-level (Pre-synthesis) and during Scheduling Source-level code refinement using Pre-synthesis transformations Source-level code refinement using Pre-synthesis transformations Code Restructuring by Speculative Code Motions Code Restructuring by Speculative Code Motions Operation replication to improve concurrency Operation replication to improve concurrency Dynamic transformations: exploit new opportunities during scheduling Dynamic transformations: exploit new opportunities during scheduling

7 Copyright Sumit Gupta 2004 7 PHLS Transformations Organized into Four Groups 1.Pre-synthesis: Loop-invariant code motions, Loop unrolling, CSE 2.Scheduling: Speculative Code Motions, Multi- cycling, Operation Chaining, Loop Pipelining 3.Dynamic: Transformations applied dynamically during scheduling: Dynamic CSE, Dynamic Copy Propagation, Dynamic Branch Balancing 4.Basic Compiler Transformations: Copy Propagation, Dead Code Elimination

8 Copyright Sumit Gupta 2004 8 Speculative Code Motions + + If Node TF Reverse Speculation Conditional Speculation Across Hierarchical Blocks _ a b c Operation Movement to reduce impact of Programming Style on Quality of HLS Results

9 Copyright Sumit Gupta 2004 9 Speculative Code Motions + + If Node TF Reverse Speculation Conditional Speculation Across Hierarchical Blocks _ a b c Operation Movement to reduce impact of Programming Style on Quality of HLS Results Early Condition Execution Evaluates conditions As soon as possible

10 Copyright Sumit Gupta 2004 10 Dynamic Transformations Called “dynamic” since they are applied during scheduling (versus a pass before/after scheduling) Called “dynamic” since they are applied during scheduling (versus a pass before/after scheduling) Dynamic Branch Balancing Dynamic Branch Balancing Increase the scope of code motions Increase the scope of code motions Reduce impact of programming style on HLS results Reduce impact of programming style on HLS results Dynamic CSE and Dynamic Copy Propagation Dynamic CSE and Dynamic Copy Propagation Exploit the Operation movement and duplication due to speculative code motions Exploit the Operation movement and duplication due to speculative code motions Create new opportunities to apply these transformations Create new opportunities to apply these transformations Reduce the number of operations Reduce the number of operations

11 Copyright Sumit Gupta 2004 11 Dynamic Branch Balancing If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d S0 S1 S2 S3 + Resource Allocation Original Design If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d Scheduled Design Unbalanced Conditional Longest Path

12 Copyright Sumit Gupta 2004 12 Insert New Scheduling Step in Shorter Branch If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d S0 S1 S2 S3 + Resource Allocation Original DesignScheduled Design

13 Copyright Sumit Gupta 2004 13 Insert New Scheduling Step in Shorter Branch If Node TF BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d If Node TF _ e BB 0 BB 2 BB 1 BB 3 BB 4 + a + b _ c _ d S0 S1 S2 S3 + Resource Allocation e __ e Original DesignScheduled Design Dynamic Branch Balancing inserts new scheduling steps Enables Conditional Speculation Enables Conditional Speculation Leads to further code compaction Leads to further code compaction

14 Copyright Sumit Gupta 2004 14 Dynamic CSE: Going beyond Traditional CSE a = b + c; cd = b < c; if (cd) d = b + c; else e = g + h; C Description BB 2BB 3 BB 1 d = b + c BB 4 a = b + c e = g + h HTG Representation If Node T F BB 0 BB 2BB 3 BB 1 d = a BB 4 a = b + c e = g + h After Traditional CSE If Node TF BB 0

15 Copyright Sumit Gupta 2004 15 a = b + c; cd = b < c; if (cd) d = b + c; else e = g + h; C Description BB 2BB 3 BB 1 d = b + c BB 4 a = b + c e = g + h HTG Representation If Node T F BB 0 BB 2BB 3 BB 1 d = a BB 4 a = b + c e = g + h After CSE If Node TF BB 0 We use notion of Dominance of Basic Blocks We use notion of Dominance of Basic Blocks Basic block BBi dominates BBj if all control paths from the initial basic block of the design graph leading to BBj goes through BBi Basic block BBi dominates BBj if all control paths from the initial basic block of the design graph leading to BBj goes through BBi We can eliminate an operation opj in BBj using common expression in opi if BBi dominates BBj We can eliminate an operation opj in BBj using common expression in opi if BBi dominates BBj Dynamic CSE: Going beyond Traditional CSE

16 Copyright Sumit Gupta 2004 16 New Opportunities for “Dynamic” CSE Due to Code Motions BB 2BB 3 BB 1 a = b + c BB 6BB 7 BB 5 d = b + c BB 4 BB 8 Scheduler decides to Speculate BB 2BB 3 BB 1 a = dcse BB 6BB 7 BB 5 d = b + c BB 4 BB 8 dcse = b + c BB 0 CSE not possible since BB2 does not dominate BB6 CSE possible now since BB0 dominates BB6

17 Copyright Sumit Gupta 2004 17 BB 2BB 3 BB 1 a = b + c BB 6BB 7 BB 5 d = b + c BB 4 BB 8 BB 2BB 3 BB 1 a = dcse BB 6BB 7 BB 5 d = dcse BB 4 BB 8 dcse = b + c BB 0 Scheduler decides to Speculate New Opportunities for “Dynamic” CSE Due to Code Motions CSE not possible since BB2 does not dominate BB6 CSE possible now since BB0 dominates BB6 If scheduler moves or duplicates an operation op, apply CSE on remaining operations using op

18 Copyright Sumit Gupta 2004 18 Condition Speculation & Dynamic CSE BB 1BB 2 BB 0 BB 5BB 6 BB 4 a = b + c BB 3 BB 7 d = b + c BB 1BB 2 BB 0 BB 5BB 6 BB 4 a = a' BB 3 BB 7 a' = b + c d = b + c BB 8 Scheduler decides to Conditionally Speculate

19 Copyright Sumit Gupta 2004 19 Condition Speculation & Dynamic CSE BB 1BB 2 BB 0 BB 5BB 6 BB 4 a = b + c BB 3 BB 7 d = b + c BB 1BB 2 BB 0 BB 5BB 6 BB 4 a = a' BB 3 BB 7 a' = b + c d = a' BB 8 Scheduler decides to Conditionally Speculate Use the notion of dominance by groups of basic blocks Use the notion of dominance by groups of basic blocks => BB1 and BB2 together dominate BB8 All Control Paths leading up to BB8 come from either BB1 or BB2: => BB1 and BB2 together dominate BB8

20 Copyright Sumit Gupta 2004 20 Loop Shifting: An Incremental Loop Pipelining Technique BB 0 b + _ d Loop Exit Loop Node BB 3 BB 2 BB 1 BB 4 BB 0 b + _ d Loop Exit Loop Node BB 3 BB 2 BB 1 BB 4 a + c _ a + c _ LoopShifting a + c _

21 Copyright Sumit Gupta 2004 21 Loop Shifting: An Incremental Loop Pipelining Technique BB 0 a + b c _ + _ d Loop Exit Loop Node BB 3 BB 2 BB 1 BB 4 BB 0 b + _ d Loop Exit Loop Node BB 3 BB 2 BB 1 BB 4 a + c _ a + c _LoopShifting Compac -tion

22 Copyright Sumit Gupta 2004 22 SPARK High Level Synthesis Framework

23 Copyright Sumit Gupta 2004 23 SPARK Parallelizing HLS Framework C input and Synthesizable RTL VHDL output C input and Synthesizable RTL VHDL output Tool-box of Transformations and Heuristics Tool-box of Transformations and Heuristics Each of these can be developed independently of the other Each of these can be developed independently of the other Script based control over transformations & heuristics Script based control over transformations & heuristics Hierarchical Intermediate Representation (HTGs) Hierarchical Intermediate Representation (HTGs) Retains structural information about design (conditional blocks, loops) Retains structural information about design (conditional blocks, loops) Enables efficient and structured application of transformations Enables efficient and structured application of transformations Complete HLS tool: Does Binding, Control Synthesis and Backend VHDL generation Complete HLS tool: Does Binding, Control Synthesis and Backend VHDL generation Interconnect Minimizing Resource Binding Interconnect Minimizing Resource Binding Enables Graphical Visualization of Design description and intermediate results Enables Graphical Visualization of Design description and intermediate results 100,000+ lines of C++ code 100,000+ lines of C++ code

24 Copyright Sumit Gupta 2004 24 Synthesizable C ANSI-C front end from Edison Design Group (EDG) ANSI-C front end from Edison Design Group (EDG) Features of C not supported for synthesis Features of C not supported for synthesis Pointers Pointers However, Arrays and passing by reference are supported However, Arrays and passing by reference are supported Recursive Function Calls Recursive Function Calls Gotos Gotos Features for which support has not been implemented Features for which support has not been implemented Multi-dimensional arrays Multi-dimensional arrays Structs Structs Continue, Breaks Continue, Breaks Hardware component generated for each function Hardware component generated for each function A called function is instantiated as a hardware component in calling function A called function is instantiated as a hardware component in calling function

25 Copyright Sumit Gupta 2004 25 HTGDFG Graph Visualization

26 Copyright Sumit Gupta 2004 26 Resource Utilization Graph Scheduling

27 Copyright Sumit Gupta 2004 27 Example of Complex HTG Example of a real design: MPEG-1 pred2 function Example of a real design: MPEG-1 pred2 function Just for demonstration; you are not expected to read the text Just for demonstration; you are not expected to read the text Multiple nested loops and conditionals Multiple nested loops and conditionals

28 Copyright Sumit Gupta 2004 28 Experiments Results presented here for Results presented here for Pre-synthesis transformations Pre-synthesis transformations Speculative Code Motions Speculative Code Motions Dynamic CSE Dynamic CSE We used SPARK to synthesize designs derived from several industrial designs We used SPARK to synthesize designs derived from several industrial designs MPEG-1, MPEG-2, GIMP Image Processing software MPEG-1, MPEG-2, GIMP Image Processing software Case Study of Intel Instruction Length Decoder Case Study of Intel Instruction Length Decoder Scheduling Results Scheduling Results Number of States in FSM Number of States in FSM Cycles on Longest Path through Design Cycles on Longest Path through Design VHDL: Logic Synthesis VHDL: Logic Synthesis Critical Path Length (ns) Critical Path Length (ns) Unit Area Unit Area

29 Copyright Sumit Gupta 2004 29 Target Applications Design # of Ifs # of Loops # Non-Empty Basic Blocks # of Operations MPEG-1 pred1 4217123 MPEG-1 pred2 11645287 MPEG-2 dp_frame 18461260 GIMPtiler11235150

30 Copyright Sumit Gupta 2004 30 + Speculative Code Motions + Pre-Synthesis Transforms + Dynamic CSE Scheduling & Logic Synthesis Results Non-speculative CMs: Within BBs & Across Hier Blocks 42% 10% 36% 8% 39% Overall: 63-66 % improvement in Delay Almost constant Area

31 Copyright Sumit Gupta 2004 31 Non-speculative CMs: Within BBs & Across Hier Blocks + Speculative Code Motions + Pre-Synthesis Transforms + Dynamic CSE Scheduling & Logic Synthesis Results 14% 20% 1% 33% 41% 52% Overall: 48-76 % improvement in Delay Almost constant Area

32 Copyright Sumit Gupta 2004 32 Case Study: Intel Instruction Length Decoder Stream of Instructions Instruction Length Decoder First Insn Second Insn Third Instruction Instruction Buffer

33 Copyright Sumit Gupta 2004 33 Example Design: ILD Block from Intel Case Study: A design derived from the Instruction Length Decoder of the Intel Pentium® class of processors Case Study: A design derived from the Instruction Length Decoder of the Intel Pentium® class of processors Decodes length of instructions streaming from memory Decodes length of instructions streaming from memory Has to look at up to 4 bytes at a time Has to look at up to 4 bytes at a time Has to execute in one cycle and decode about 64 bytes of instructions Has to execute in one cycle and decode about 64 bytes of instructions  Characteristics of Microprocessor functional blocks Low Latency: Single or Dual cycle implementation Low Latency: Single or Dual cycle implementation Consist of several small computations Consist of several small computations Intermix of control and data logic Intermix of control and data logic

34 Copyright Sumit Gupta 2004 34 ILD Synthesis: Resulting Architecture Speculate Operations, Fully Unroll Loop, Eliminate Loop Index Variable Multi-cycle Sequential Architecture Multi-cycle Sequential Architecture Single cycle Parallel Architecture Single cycle Parallel Architecture Our toolbox approach enables us to develop a script to synthesize applications from different domains Our toolbox approach enables us to develop a script to synthesize applications from different domains Final design looks close to the actual implementation done by Intel Final design looks close to the actual implementation done by Intel

35 Copyright Sumit Gupta 2004 35 Conclusions Parallelizing code transformations enable a new range of HLS transformations Parallelizing code transformations enable a new range of HLS transformations Provide the needed improvement in quality of HLS results Provide the needed improvement in quality of HLS results Possible to be competitive against manually designed circuits. Possible to be competitive against manually designed circuits. Can enable productivity improvements in microelectronic design Can enable productivity improvements in microelectronic design Built a synthesis system with a range of code transformations Built a synthesis system with a range of code transformations Platform for applying Coarse and Fine-grain Optimizations Platform for applying Coarse and Fine-grain Optimizations Tool-box approach where transformations and heuristics can be developed Tool-box approach where transformations and heuristics can be developed Enables the designer to find the right synthesis script for different application domains Enables the designer to find the right synthesis script for different application domains Performance improvements of 60-70 % across a number of designs Performance improvements of 60-70 % across a number of designs We have shown its effectiveness on an Intel design We have shown its effectiveness on an Intel design

36 Copyright Sumit Gupta 2004 36 Acknowledgements Advisors Advisors Professors Rajesh Gupta, Nikil Dutt, Alex Nicolau Professors Rajesh Gupta, Nikil Dutt, Alex Nicolau Contributors to SPARK framework Contributors to SPARK framework Nick Savoiu, Mehrdad Reshadi, Sunwoo Kim Nick Savoiu, Mehrdad Reshadi, Sunwoo Kim Intel Strategic CAD Labs (SCL) Intel Strategic CAD Labs (SCL) Timothy Kam, Mike Kishinevsky Timothy Kam, Mike Kishinevsky Supported by Semiconductor Research Corporation and Intel SCL Supported by Semiconductor Research Corporation and Intel SCL

37 Copyright Sumit Gupta 2004 37 SPARK Release Binaries for Linux, Solaris, and Windows are available at: Binaries for Linux, Solaris, and Windows are available at: http://www.cecs.uci.edu/~spark Includes Includes User Manual User Manual Tutorial of MPEG-1 Player Tutorial of MPEG-1 Player

38 Copyright Sumit Gupta 2004 38 Compilation for Coarse-Grain Reconfigurable Architectures

39 Copyright Sumit Gupta 2004 39 Use of Coarse-Grain Reconfigurable Fabrics as Co-Processors DMACPUDSP $ I/P O/P CoProc Focus of this work: Build a compiler framework to map applications to Coarse Grain Reconfigurable Architectures These architectures consist Processing Elements (PEs) connected together in a network

40 Copyright Sumit Gupta 2004 40 Different Topology Traversals (a) Zig-Zag (c) Spiral(b) Reverse-S  Spiral traversal performs best because: Each PE traversed is adjacent to the previous PE Each PE traversed is adjacent to the previous PE PEs with more neighbors/connections are traversed first PEs with more neighbors/connections are traversed first

41 Copyright Sumit Gupta 2004 41 Ongoing Work Different ways to traverse topology during scheduling Different ways to traverse topology during scheduling Effect of Different Connection Delay Models Effect of Different Connection Delay Models As technology improves, processing elements (PEs) will be much faster than the connections between them As technology improves, processing elements (PEs) will be much faster than the connections between them Effects of Different PE configurations Effects of Different PE configurations Different number of functional units per PE Different number of functional units per PE Different Cycle time for Adds versus Multiplies Different Cycle time for Adds versus Multiplies Taking Memory Bandwidth & Registers into consideration during Scheduling Taking Memory Bandwidth & Registers into consideration during Scheduling

42 Thank You

43 Copyright Sumit Gupta 2004 43 Case Study: Implementation of MPEG-1 Prediction Block on a FPGA Platform Developed novel memory mapping algorithm to fit memory elements/ application onto FPGA platform Developed novel memory mapping algorithm to fit memory elements/ application onto FPGA platform In collaboration with Manev Luthra In collaboration with Manev Luthra C Input MPEG-1 Pred Block Execution Profiling Manual HW/SW Partitioning Processor Core MemoryI/O Hardware C Description Software C Description Software Compiler SPARK High-Level Synthesis FPGA FPGA Platform

44 Copyright Sumit Gupta 2004 44 Recent Related Work Mostly related to code scheduling in the presence of conditionals Mostly related to code scheduling in the presence of conditionals Condition Vector List Scheduling [Wakabayashi 89] Condition Vector List Scheduling [Wakabayashi 89] Symbolic Scheduling [Radivojevic 96] Symbolic Scheduling [Radivojevic 96] WaveSched Scheduler [Lakshminarayana 98] WaveSched Scheduler [Lakshminarayana 98] Basic Block Control Graph Scheduling [Santos 99] Basic Block Control Graph Scheduling [Santos 99] Limitations Limitations Arbitrary nesting of conditionals and loops not handled or handled poorly Arbitrary nesting of conditionals and loops not handled or handled poorly ad hoc optimizations: optimizations applied in isolation ad hoc optimizations: optimizations applied in isolation Limited/no analysis of logic and control costs Limited/no analysis of logic and control costs Not clear if an optimization has positive impact beyond scheduling Not clear if an optimization has positive impact beyond scheduling

45 Copyright Sumit Gupta 2004 45 Basic Instruction Length Decoder: Initial Description Length Contribution 1 Need Byte 4 ? Need Byte 2 ? Need Byte 3 ? Byte 1Byte 2Byte 3 Byte 4 = + + + Total Length Of Instruction Length Contribution 2Length Contribution 3Length Contribution 4

46 Copyright Sumit Gupta 2004 46 Instruction Length Decoder: Decoding 2 nd Instruction Length Contribution 1 Need Byte 4 ? Need Byte 2 ? Need Byte 3 ? Byte 3Byte 4 = + + + Total Length Of Insn Length Contribution 2Length Contribution 3Length Contribution 4 Byte 5Byte 6 First Insn After decoding the length of an instruction  Start looking from next byte  Again examine up to 4 bytes to determine length of next instruction

47 Copyright Sumit Gupta 2004 47 Instruction Length Decoder: Parallelized Description Need Byte 4 ? Need Byte 2 ? Need Byte 3 ? Byte 1Byte 2Byte 3 Byte 4 Length Contribution 1 Length Contribution 2 Length Contribution 3 Length Contribution 4 = + + + Total Length Of Instruction  Speculatively calculate the length contribution of all 4 bytes at a time  Determine actual total length of instruction based on this data

48 Copyright Sumit Gupta 2004 48 ILD: Extracting Further Parallelism Byte 1Byte 2Byte 3 Byte 4 Byte 1 Insn. Len Calc Byte 3 Insn. Len Calc Byte 5 Insn. Len Calc Byte 2 Insn. Len Calc Byte 4 Insn. Len Calc Byte 5  Speculatively calculate length of instructions assuming a new instruction starts at each byte  Do this calculation for all bytes in parallel  Traverse from 1 st byte to last  Determine length of instructions starting from the 1 st till the last  Discard unused calculations

49 Copyright Sumit Gupta 2004 49 Initial: Multi-Cycle Sequential Architecture Length Contribution 1 Need Byte 4 ? Need Byte 3 ? Byte 1Byte 2Byte 3 Byte 4 Length Contribution 2Length Contribution 3Length Contribution 4 Need Byte 2 ?

50 Copyright Sumit Gupta 2004 50 BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB 8 + + c b + a BB 0 BB 9 + d How Code Motions are Employed by Scheduler BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB 8 + + + c b d + + a BB 0 BB 9 Speculate Across HTG

51 Copyright Sumit Gupta 2004 51 BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB 8 + + c b + a BB 0 BB 9 + d BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB 8 + + + c b d + + Across HTG Conditional Speculation + a + d BB 0 BB 9 + d How Code Motions are Employed by Scheduler

52 Copyright Sumit Gupta 2004 52 Candidate Chooser Candidate Mover Candidate Fetcher IR Walker Scheduler Dynamic Transforms Integrating transformations into Scheduler Candidate Walker Candidate Validater Available Operations Determine Code Motions Required to schedule op Branch Balancing During Traversal Branch Balancing During Code Motion Apply Speculative Code Motions Apply Speculative Code Motions

53 Copyright Sumit Gupta 2004 53 Architecture of the PHLS Scheduler Candidate Chooser Candidate Mover Candidate Fetcher IR Walker Traverses Design to find next basic block to schedule Traverses Design to find Candidate Operations to schedule Calculates Cost of Operations and chooses Operation with lowest cost for scheduling Moves, duplicates and schedules chosen Operation Scheduler Dynamic Transforms Dynamically apply transformations such as CSE on remaining Candidate Operations using scheduled operation

54 Copyright Sumit Gupta 2004 54 Publications Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive Designs Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive Designs S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, DATE, March 2003 S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, DATE, March 2003 SPARK : A High-Level Synthesis Framework For Applying Parallelizing Compiler Transformations S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, VLSI Design 2003 Best Paper Award SPARK : A High-Level Synthesis Framework For Applying Parallelizing Compiler Transformations S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, VLSI Design 2003 Best Paper Award Dynamic Common Sub-Expression Elimination during Scheduling in High-Level Synthesis S. Gupta, M. Reshadi, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2002 Dynamic Common Sub-Expression Elimination during Scheduling in High-Level Synthesis S. Gupta, M. Reshadi, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2002 Coordinated Transformations for High-Level Synthesis of High Performance Microprocessor Blocks S. Gupta, T. Kam, M. Kishinevsky, S. Rotem, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, DAC 2002 Coordinated Transformations for High-Level Synthesis of High Performance Microprocessor Blocks S. Gupta, T. Kam, M. Kishinevsky, S. Rotem, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, DAC 2002 Conditional Speculation and its Effects on Performance and Area for High-Level Synthesis S. Gupta, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2001 Conditional Speculation and its Effects on Performance and Area for High-Level Synthesis S. Gupta, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2001 Speculation Techniques for High Level synthesis of Control Intensive Designs S. Gupta, N. Savoiu, S. Kim, N.D. Dutt, R.K. Gupta, A. Nicolau, DAC 2001 Speculation Techniques for High Level synthesis of Control Intensive Designs S. Gupta, N. Savoiu, S. Kim, N.D. Dutt, R.K. Gupta, A. Nicolau, DAC 2001 Analysis of High-level Address Code Transformations for Programmable Processors S. Gupta, M. Miranda, F. Catthoor, R. K. Gupta, DATE 2000 Analysis of High-level Address Code Transformations for Programmable Processors S. Gupta, M. Miranda, F. Catthoor, R. K. Gupta, DATE 2000 Synthesis of Testable RTL Designs using Adaptive Simulated Annealing Algorithm C.P. Ravikumar, S. Gupta, A. Jajoo, Intl. Conf. on VLSI Design, 1998 Best Student Paper Award Synthesis of Testable RTL Designs using Adaptive Simulated Annealing Algorithm C.P. Ravikumar, S. Gupta, A. Jajoo, Intl. Conf. on VLSI Design, 1998 Best Student Paper Award Book Chapter ASIC Design, S. Gupta, R. K. Gupta, Chapter 64, The VLSI Handbook, Edited by Wai- Kai Chen ASIC Design, S. Gupta, R. K. Gupta, Chapter 64, The VLSI Handbook, Edited by Wai- Kai Chen 2 Journal papers and 1 Conference paper under submission


Download ppt "Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A C-to-VHDL Parallelizing High-Level."

Similar presentations


Ads by Google