08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine High-Level.

08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu/~spark High-Level Synthesis of High Performance Microprocessor Blocks Nick Savoiu Nikil Dutt Rajesh Gupta Alex Nicolau SPARK High Level Synthesis System Supported by Semiconductor Research Corporation and Intel Timothy Kam Michael Kishinevsky Steve Haynal Abdallah Tabbara Sumit Gupta Strategic CAD Labs Design Technologies Intel Inc, Hillsboro http://www.intel.com/research/scl

2 Copyright CECS & The Spark Project Overview  Brief background u Spark High-Level Synthesis Framework u Previous work in Spark framework  High-level synthesis for Microprocessor blocks  Instruction Length Decoder u Design Behavior u Steps involved in synthesis  Work done this summer at SCL  Future Plans

3 Copyright CECS & The Spark Project High Level Synthesis From C to CDFG to Architecture

4 Copyright CECS & The Spark Project Scheduling with Given Resource Allocation Resource Constraints +<

5 Copyright CECS & The Spark Project The Spark High-Level Synthesis Framework

6 Copyright CECS & The Spark Project Limitations of high-level synthesis targeted by Spark  Quality of synthesis results severely effected by complex control flow u Control flow style effects the effectiveness of optimizations u Nested ifs and loops not handled or handled poorly  Poor understanding (much less integration) of the interaction between source-level and fine grain “compiler” transformations  No comprehensive synthesis framework u Few and scattered optimizations u Results presented for scheduling F Effects on logic synthesis not understood u Small, synthetic benchmarks

7 Copyright CECS & The Spark Project Generalized Code Motions + + + If Node TF Conditional Speculation Reverse Speculation Speculation Across Hierarchical Blocks

8 Copyright CECS & The Spark Project Characteristics of ASIC Design  Large designs such as MPEG  Multi-cycle implementation  Resource constrained Implications on transformations applied  Extraction of parallelism constrained by area limitations u Speculation may lead to additional registers  More conservative with transformations such as loop unrolling

9 Copyright CECS & The Spark Project Characteristics of Microprocessor Blocks  Smaller Designs  Single or Dual Cycle implementation  High performance u Extract maximal parallelism u Area constraints are more lax Implications on transformations applied  Operations within behavior are chained together with no latching  All loops can be unrolled

10 Copyright CECS & The Spark Project Simplified Instruction Length Decoder Byte 0 Byte 1 Byte 2 Byte 3 Length Contribution NeedNextByte

11 Copyright CECS & The Spark Project Simplified Instruction Length Decoder Byte 0 Byte 1 Byte 2 Byte 3 Length Contribution NeedNextByte First Instruction

12 Copyright CECS & The Spark Project Behavioral Description in C NextStartByte = 0; for (i=0; i < n; i++) { len[i] = CalculateLength(i); if (i == NextStartByte) { NextStartByte = len[i]; Mark[i] = 1; } } /* for (i=0; i < n; i++) */ int CalculateLength(i) { lc1 = LengthContribution(i); need1 = need_next_byte(i); if (need1) { lc2 = LengthContribution(i+1); need2 = need_next_byte(i+1); if (need2) { lc3 = LengthContribution(i+2); need3 = need_next_byte(i+2); if (need3) { lc4 = LengthContribution(i+3); Length = lc1 + lc2 + lc3 + lc4; } else Length = lc1 + lc2 + lc3; } else Length = lc1 + lc2; } else Length = lc1; return Length; }

13 Copyright CECS & The Spark Project Control Logic Data Calculation Speculate Maximally NextStartByte = 0; for (i=0; i < n; i++) { len[i] = CalculateLength(i); if (i == NextStartByte) { NextStartByte = len[i]; Mark[i] = 1; } } /* for (i=0; i < n; i++) */ int CalculateLength(i) { lc1 = LengthContribution(i); need1 = need_next_byte(i); lc2 = LengthContribution(i+1); need2 = need_next_byte(i+1); lc3 = LengthContribution(i+2); need3 = need_next_byte(i+2); lc4 = LengthContribution(i+3); TempLength1 = lc1 + lc2 + lc3 + lc4; TempLength2 = lc1 + lc2 + lc3; TempLength3 = lc1 + lc2; if (need1) { if (need2) { if (need3) { Length = TempLength1; } else Length = TempLength2; } else Length = TempLength3; } else Length = lc1; return Length; }

14 Copyright CECS & The Spark Project Inlining (Done Earlier) Control Logic Data Calculation NextStartByte = 0; for (i=0; i < n; i++) { Results(i) = DataCalulation(i, i+1, i+2, i+3); Length(i) = ControlLogic(Results(i)); len[i] = Length(i); if (i == NextStartByte) { NextStartByte = len[i]; Mark[i] = 1; } } /* for (i=0; i < n; i++) */ int CalculateLength(i) { lc1 = LengthContribution(i); need1 = need_next_byte(i); lc2 = LengthContribution(i+1); need2 = need_next_byte(i+1); lc3 = LengthContribution(i+2); need3 = need_next_byte(i+2); lc4 = LengthContribution(i+3); TempLength1 = lc1 + lc2 + lc3 + lc4; TempLength2 = lc1 + lc2 + lc3; TempLength3 = lc1 + lc2; if (need1) { if (need2) { if (need3) { Length = TempLength1; } else Length = TempLength2; } else Length = TempLength3; } else Length = lc1; return Length; }

15 Copyright CECS & The Spark Project Unroll Loop Completely NextStartByte = 0; i=0; Results(i) = DataCalculation(i, i+1, i+2, i+3); Length(i) = ControlLogic(Results(i)); len[i] = Length(i); if (i == NextStartByte) { NextStartByte = len[i]; Mark[i] = 1; } Results(i+1) = DataCalculation(i +1, i+2, i+3, i+4); Length(i +1) = ControlLogic(Results(i +1)); len[i +1] = Length(i +1); if (i +1 == NextStartByte) { NextStartByte = len[i +1]; Mark[i +1] = 1; } Shown For Only 2 Unrolls

16 Copyright CECS & The Spark Project Propagate Constant: Loop Index NextStartByte = 0; Results(0) = DataCalculation(0, 1, 2, 3); Length(0) = ControlLogic(Results(0)); len[0] = Length(0); if (0 == NextStartByte) { NextStartByte = len[0]; Mark[0] = 1; } Results(1) = DataCalculation(1, 2, 3, 4); Length(1) = ControlLogic(Results(1)); len[1] = Length(1); if (1 == NextStartByte) { NextStartByte = len[1]; Mark[1] = 1; }

17 Copyright CECS & The Spark Project Data Calculation Ripple Control Logic Control Logic Maximally Parallelize/Compact Results(0) = DataCalculation(0,1,2,3); Results(1) = DataCalculation(1,2,3,4); … Results(n) = DataCalulation(n, n+1, n+2, n+3); Length(0) = ControlLogic(Results(0)); Length(1) = ControlLogic(Results(1)); … Length(n) = ControlLogic(Results(n)); len[0] = Length(0); len[1] = Length(1); … len[n] = Length(n); NextStartByte = 0; if (0 == NextStartByte) { NextStartByte = len[0]; Mark[0] = 1; } if (1 == NextStartByte) { NextStartByte = len[1]; Mark[1] = 1; } … if (n == NextStartByte) { NextStartByte = len[n]; Mark[n] = 1; }

18 Copyright CECS & The Spark Project Final Design Architecture Data Calculation Ripple Control Logic Control Logic Results(0) = DataCalculation(0,1,2,3) Results(1) = DataCalculation(1,2,3,4) … Results(n) = DataCalculation(n, n+1, n+2, n+3); Length(0) = ControlLogic(Results(0)); Length(1) = ControlLogic(Results(1)); … Length(n) = ControlLogic(Results(n)); if (0 == NextStartByte) { NextStartByte = len[0]; Mark[0] = 1; } … if (n == NextStartByte) { NextStartByte = len[n]; Mark[n] = 1; } Data Calculation Control Logic Ripple Logic Instruction Buffer

19 Copyright CECS & The Spark Project ILD Tasks Achieved This Summer  Chaining across conditional boundaries u Enables single cycle schedules u Useful as general high-level synthesis transformation as well u Had implications on other things such as VHDL generation  Complete unrolling of loops u Was implemented previously  Constant Propagation u Useful for loop index propagation after unrolling

20 Copyright CECS & The Spark Project Other Interaction within SCL  Interfacing with HLD team via XML u Implemented XML generation pass u Creates a path from C for NexSiS and the rest of HLD flow u Being driven by requirements from Abdallah  Analyzed some other designs u Whitney: 3-D design u FAX: Willamette floating point unit

21 Copyright CECS & The Spark Project Future Plans  Continue to work on ILD with the more complicated (complete) design  Look at similar designs u Detect first 3 zeros in 32 bit vector  Develop a set of transformations targeted to such high performance blocks  Expand interaction with HLD Design flow u Do some transformations before handing over CDFG via XML to Symbolic Scheduling u For example: transformations that lead to node duplication, source-to-source transformations, some loop transformations

22 Copyright CECS & The Spark Project Additional Slides

23 Copyright CECS & The Spark Project Spark’s Methodology  Applies coarse and fine grain compiler optimizations u Targets control flow transformations u “Fine grain” loop optimization techniques for multiple and nested loops u Mixed IR suitable for fine and coarse grain compiler transformations (similar to other systems such as SUIF)  Synthesis from C provides u Flow from architecture design to synthesis u Opportunity to apply coarse grain optimizations  Compiler transformations modified to target HLS u Multiple mutually-exclusive operations can be scheduled on the same resource in the same cycle

24 Copyright CECS & The Spark Project Spark’s Methodology  Customizable extensible scheduler u Range of transformations in modular toolbox F Percolation, trailblazing, loop pipelining (RDLP), inlining u Selected under heuristics and/or user control F Code motion, loop transformations  Ability to generate synthesizable RTL VHDL u Integrates with current IC design flows u Code generation at various levels: F Behavioral C F Behavioral VHDL F Structural VHDL

25 Copyright CECS & The Spark Project Generalized Code Motions  Hierarchical code motions u Operations are moved across entire conditional structures  Speculation to improve resource utilization u Has to be controlled to limit impact on number of registers  Reverse speculation u Moves operations down into conditional branches  Early condition execution u Evaluates conditionals as soon as the corresponding operation has been executed  Conditional Speculation u Duplicates operations up into conditional branches

26 Copyright CECS & The Spark Project Scheduling Results on MPEG Prediction Block

27 Copyright CECS & The Spark Project Scheduling Results on ADPCM Encoder

28 Copyright CECS & The Spark Project Scheduling Results  Synthesis results after scheduling by Spark show u Considerable gain in execution cycles u Critical path decreases marginally u Area can increase significantly  Benchmarks used are large real-life applications u well written; no gains due to sloppy code

29 Copyright CECS & The Spark Project Interconnect minimization by resource binding  Minimize the complexity of steering logic u Multiplexors and demultiplexors  Bind operations with same inputs and outputs to same functional units  Bind variables, which are inputs/outputs to same functional units, to the same registers

30 Copyright CECS & The Spark Project Results after Binding

31 Copyright CECS & The Spark Project Results after Binding: ADPCM

32 Copyright CECS & The Spark Project Future Plans  Synthesis for high-performance microprocessor blocks u Single cycle behavioral descriptions  Timing analysis and time budgeting u Introducing time constrained synthesis  Loop Transformations u Parallelizing compiler transformations: loop interchange, exchange, splitting, fusion  Resource versus Throughput analysis  Cost models for code motions

33 Copyright CECS & The Spark Project The Intermediate Representation HTG/CDFG EDG AST Hierarchical Task Graph (HTG) is main structure in the intermediate representation (IR) Maintains information on: Code structure (IFs, LOOPs) Loop bounds, type (FOR, WHILE) Array accesses are not lowered to address calculation followed by memory access Is complete Can regenerate input C code

34 Copyright CECS & The Spark Project IR Examples

35 Copyright CECS & The Spark Project C codeCDFGHTG

36 Copyright CECS & The Spark Project Scheduling

37 Copyright CECS & The Spark Project The Scheduler Framework Scheduler framework philosophy modular, reusability allow designer to write new scheduling algorithms with minimal effort Toolbox approach core transformations: percolation, trailblazing, RDLP heuristics to decide which transformations are to be applied

38 Copyright CECS & The Spark Project The Scheduler Framework  Designed to be completely customizable in terms of the scheduling algorithms and heuristics used  An instance of a scheduling algorithm consists of a set of u IR traversal algorithms u code motion algorithms u scheduling heuristics  The designer can use predefined algorithms and heuristics or design new ones u enabled by toolbox approach Algorithm Scheduling Heuristics Candidate Validators Candidate Provider IR Walkers

39 Copyright CECS & The Spark Project Extracting Parallelism with Speculation

40 Copyright CECS & The Spark Project Reverse Speculation Moves operations into conditionals Only moves to branches which require result Moves operations with lower priority

41 Copyright CECS & The Spark Project Early Condition Execution Evaluates conditions ASAP Moves all unscheduled operations into conditionals Uses reverse speculation to achieve this

42 Copyright CECS & The Spark Project Conditional Speculation

43 Copyright CECS & The Spark Project RDLP Example A i=i+1 B j=i+h C k=i+g D l=j+1 A B : C D A D : A Original LoopCompact Shift and Pipeline A B : C D : A Unroll and compact B : C D

08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine High-Level.

Similar presentations

Presentation on theme: "08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine High-Level."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine High-Level.

Similar presentations

Presentation on theme: "08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine High-Level."— Presentation transcript:

Similar presentations

About project

Feedback