Download presentation
Presentation is loading. Please wait.
1
08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu/~spark High-Level Synthesis of High Performance Microprocessor Blocks Nick Savoiu Nikil Dutt Rajesh Gupta Alex Nicolau SPARK High Level Synthesis System Supported by Semiconductor Research Corporation and Intel Timothy Kam Michael Kishinevsky Steve Haynal Abdallah Tabbara Sumit Gupta Strategic CAD Labs Design Technologies Intel Inc, Hillsboro http://www.intel.com/research/scl
2
2 Copyright CECS & The Spark Project Overview Brief background u Spark High-Level Synthesis Framework u Previous work in Spark framework High-level synthesis for Microprocessor blocks Instruction Length Decoder u Design Behavior u Steps involved in synthesis Work done this summer at SCL Future Plans
3
3 Copyright CECS & The Spark Project High Level Synthesis From C to CDFG to Architecture
4
4 Copyright CECS & The Spark Project Scheduling with Given Resource Allocation Resource Constraints +<
5
5 Copyright CECS & The Spark Project The Spark High-Level Synthesis Framework
6
6 Copyright CECS & The Spark Project Limitations of high-level synthesis targeted by Spark Quality of synthesis results severely effected by complex control flow u Control flow style effects the effectiveness of optimizations u Nested ifs and loops not handled or handled poorly Poor understanding (much less integration) of the interaction between source-level and fine grain “compiler” transformations No comprehensive synthesis framework u Few and scattered optimizations u Results presented for scheduling F Effects on logic synthesis not understood u Small, synthetic benchmarks
7
7 Copyright CECS & The Spark Project Generalized Code Motions + + + If Node TF Conditional Speculation Reverse Speculation Speculation Across Hierarchical Blocks
8
8 Copyright CECS & The Spark Project Characteristics of ASIC Design Large designs such as MPEG Multi-cycle implementation Resource constrained Implications on transformations applied Extraction of parallelism constrained by area limitations u Speculation may lead to additional registers More conservative with transformations such as loop unrolling
9
9 Copyright CECS & The Spark Project Characteristics of Microprocessor Blocks Smaller Designs Single or Dual Cycle implementation High performance u Extract maximal parallelism u Area constraints are more lax Implications on transformations applied Operations within behavior are chained together with no latching All loops can be unrolled
10
10 Copyright CECS & The Spark Project Simplified Instruction Length Decoder Byte 0 Byte 1 Byte 2 Byte 3 Length Contribution NeedNextByte
11
11 Copyright CECS & The Spark Project Simplified Instruction Length Decoder Byte 0 Byte 1 Byte 2 Byte 3 Length Contribution NeedNextByte First Instruction
12
12 Copyright CECS & The Spark Project Behavioral Description in C NextStartByte = 0; for (i=0; i < n; i++) { len[i] = CalculateLength(i); if (i == NextStartByte) { NextStartByte = len[i]; Mark[i] = 1; } } /* for (i=0; i < n; i++) */ int CalculateLength(i) { lc1 = LengthContribution(i); need1 = need_next_byte(i); if (need1) { lc2 = LengthContribution(i+1); need2 = need_next_byte(i+1); if (need2) { lc3 = LengthContribution(i+2); need3 = need_next_byte(i+2); if (need3) { lc4 = LengthContribution(i+3); Length = lc1 + lc2 + lc3 + lc4; } else Length = lc1 + lc2 + lc3; } else Length = lc1 + lc2; } else Length = lc1; return Length; }
13
13 Copyright CECS & The Spark Project Control Logic Data Calculation Speculate Maximally NextStartByte = 0; for (i=0; i < n; i++) { len[i] = CalculateLength(i); if (i == NextStartByte) { NextStartByte = len[i]; Mark[i] = 1; } } /* for (i=0; i < n; i++) */ int CalculateLength(i) { lc1 = LengthContribution(i); need1 = need_next_byte(i); lc2 = LengthContribution(i+1); need2 = need_next_byte(i+1); lc3 = LengthContribution(i+2); need3 = need_next_byte(i+2); lc4 = LengthContribution(i+3); TempLength1 = lc1 + lc2 + lc3 + lc4; TempLength2 = lc1 + lc2 + lc3; TempLength3 = lc1 + lc2; if (need1) { if (need2) { if (need3) { Length = TempLength1; } else Length = TempLength2; } else Length = TempLength3; } else Length = lc1; return Length; }
14
14 Copyright CECS & The Spark Project Inlining (Done Earlier) Control Logic Data Calculation NextStartByte = 0; for (i=0; i < n; i++) { Results(i) = DataCalulation(i, i+1, i+2, i+3); Length(i) = ControlLogic(Results(i)); len[i] = Length(i); if (i == NextStartByte) { NextStartByte = len[i]; Mark[i] = 1; } } /* for (i=0; i < n; i++) */ int CalculateLength(i) { lc1 = LengthContribution(i); need1 = need_next_byte(i); lc2 = LengthContribution(i+1); need2 = need_next_byte(i+1); lc3 = LengthContribution(i+2); need3 = need_next_byte(i+2); lc4 = LengthContribution(i+3); TempLength1 = lc1 + lc2 + lc3 + lc4; TempLength2 = lc1 + lc2 + lc3; TempLength3 = lc1 + lc2; if (need1) { if (need2) { if (need3) { Length = TempLength1; } else Length = TempLength2; } else Length = TempLength3; } else Length = lc1; return Length; }
15
15 Copyright CECS & The Spark Project Unroll Loop Completely NextStartByte = 0; i=0; Results(i) = DataCalculation(i, i+1, i+2, i+3); Length(i) = ControlLogic(Results(i)); len[i] = Length(i); if (i == NextStartByte) { NextStartByte = len[i]; Mark[i] = 1; } Results(i+1) = DataCalculation(i +1, i+2, i+3, i+4); Length(i +1) = ControlLogic(Results(i +1)); len[i +1] = Length(i +1); if (i +1 == NextStartByte) { NextStartByte = len[i +1]; Mark[i +1] = 1; } Shown For Only 2 Unrolls
16
16 Copyright CECS & The Spark Project Propagate Constant: Loop Index NextStartByte = 0; Results(0) = DataCalculation(0, 1, 2, 3); Length(0) = ControlLogic(Results(0)); len[0] = Length(0); if (0 == NextStartByte) { NextStartByte = len[0]; Mark[0] = 1; } Results(1) = DataCalculation(1, 2, 3, 4); Length(1) = ControlLogic(Results(1)); len[1] = Length(1); if (1 == NextStartByte) { NextStartByte = len[1]; Mark[1] = 1; }
17
17 Copyright CECS & The Spark Project Data Calculation Ripple Control Logic Control Logic Maximally Parallelize/Compact Results(0) = DataCalculation(0,1,2,3); Results(1) = DataCalculation(1,2,3,4); … Results(n) = DataCalulation(n, n+1, n+2, n+3); Length(0) = ControlLogic(Results(0)); Length(1) = ControlLogic(Results(1)); … Length(n) = ControlLogic(Results(n)); len[0] = Length(0); len[1] = Length(1); … len[n] = Length(n); NextStartByte = 0; if (0 == NextStartByte) { NextStartByte = len[0]; Mark[0] = 1; } if (1 == NextStartByte) { NextStartByte = len[1]; Mark[1] = 1; } … if (n == NextStartByte) { NextStartByte = len[n]; Mark[n] = 1; }
18
18 Copyright CECS & The Spark Project Final Design Architecture Data Calculation Ripple Control Logic Control Logic Results(0) = DataCalculation(0,1,2,3) Results(1) = DataCalculation(1,2,3,4) … Results(n) = DataCalculation(n, n+1, n+2, n+3); Length(0) = ControlLogic(Results(0)); Length(1) = ControlLogic(Results(1)); … Length(n) = ControlLogic(Results(n)); if (0 == NextStartByte) { NextStartByte = len[0]; Mark[0] = 1; } … if (n == NextStartByte) { NextStartByte = len[n]; Mark[n] = 1; } Data Calculation Control Logic Ripple Logic Instruction Buffer
19
19 Copyright CECS & The Spark Project ILD Tasks Achieved This Summer Chaining across conditional boundaries u Enables single cycle schedules u Useful as general high-level synthesis transformation as well u Had implications on other things such as VHDL generation Complete unrolling of loops u Was implemented previously Constant Propagation u Useful for loop index propagation after unrolling
20
20 Copyright CECS & The Spark Project Other Interaction within SCL Interfacing with HLD team via XML u Implemented XML generation pass u Creates a path from C for NexSiS and the rest of HLD flow u Being driven by requirements from Abdallah Analyzed some other designs u Whitney: 3-D design u FAX: Willamette floating point unit
21
21 Copyright CECS & The Spark Project Future Plans Continue to work on ILD with the more complicated (complete) design Look at similar designs u Detect first 3 zeros in 32 bit vector Develop a set of transformations targeted to such high performance blocks Expand interaction with HLD Design flow u Do some transformations before handing over CDFG via XML to Symbolic Scheduling u For example: transformations that lead to node duplication, source-to-source transformations, some loop transformations
22
22 Copyright CECS & The Spark Project Additional Slides
23
23 Copyright CECS & The Spark Project Spark’s Methodology Applies coarse and fine grain compiler optimizations u Targets control flow transformations u “Fine grain” loop optimization techniques for multiple and nested loops u Mixed IR suitable for fine and coarse grain compiler transformations (similar to other systems such as SUIF) Synthesis from C provides u Flow from architecture design to synthesis u Opportunity to apply coarse grain optimizations Compiler transformations modified to target HLS u Multiple mutually-exclusive operations can be scheduled on the same resource in the same cycle
24
24 Copyright CECS & The Spark Project Spark’s Methodology Customizable extensible scheduler u Range of transformations in modular toolbox F Percolation, trailblazing, loop pipelining (RDLP), inlining u Selected under heuristics and/or user control F Code motion, loop transformations Ability to generate synthesizable RTL VHDL u Integrates with current IC design flows u Code generation at various levels: F Behavioral C F Behavioral VHDL F Structural VHDL
25
25 Copyright CECS & The Spark Project Generalized Code Motions Hierarchical code motions u Operations are moved across entire conditional structures Speculation to improve resource utilization u Has to be controlled to limit impact on number of registers Reverse speculation u Moves operations down into conditional branches Early condition execution u Evaluates conditionals as soon as the corresponding operation has been executed Conditional Speculation u Duplicates operations up into conditional branches
26
26 Copyright CECS & The Spark Project Scheduling Results on MPEG Prediction Block
27
27 Copyright CECS & The Spark Project Scheduling Results on ADPCM Encoder
28
28 Copyright CECS & The Spark Project Scheduling Results Synthesis results after scheduling by Spark show u Considerable gain in execution cycles u Critical path decreases marginally u Area can increase significantly Benchmarks used are large real-life applications u well written; no gains due to sloppy code
29
29 Copyright CECS & The Spark Project Interconnect minimization by resource binding Minimize the complexity of steering logic u Multiplexors and demultiplexors Bind operations with same inputs and outputs to same functional units Bind variables, which are inputs/outputs to same functional units, to the same registers
30
30 Copyright CECS & The Spark Project Results after Binding
31
31 Copyright CECS & The Spark Project Results after Binding: ADPCM
32
32 Copyright CECS & The Spark Project Future Plans Synthesis for high-performance microprocessor blocks u Single cycle behavioral descriptions Timing analysis and time budgeting u Introducing time constrained synthesis Loop Transformations u Parallelizing compiler transformations: loop interchange, exchange, splitting, fusion Resource versus Throughput analysis Cost models for code motions
33
33 Copyright CECS & The Spark Project The Intermediate Representation HTG/CDFG EDG AST Hierarchical Task Graph (HTG) is main structure in the intermediate representation (IR) Maintains information on: Code structure (IFs, LOOPs) Loop bounds, type (FOR, WHILE) Array accesses are not lowered to address calculation followed by memory access Is complete Can regenerate input C code
34
34 Copyright CECS & The Spark Project IR Examples
35
35 Copyright CECS & The Spark Project C codeCDFGHTG
36
36 Copyright CECS & The Spark Project Scheduling
37
37 Copyright CECS & The Spark Project The Scheduler Framework Scheduler framework philosophy modular, reusability allow designer to write new scheduling algorithms with minimal effort Toolbox approach core transformations: percolation, trailblazing, RDLP heuristics to decide which transformations are to be applied
38
38 Copyright CECS & The Spark Project The Scheduler Framework Designed to be completely customizable in terms of the scheduling algorithms and heuristics used An instance of a scheduling algorithm consists of a set of u IR traversal algorithms u code motion algorithms u scheduling heuristics The designer can use predefined algorithms and heuristics or design new ones u enabled by toolbox approach Algorithm Scheduling Heuristics Candidate Validators Candidate Provider IR Walkers
39
39 Copyright CECS & The Spark Project Extracting Parallelism with Speculation
40
40 Copyright CECS & The Spark Project Reverse Speculation Moves operations into conditionals Only moves to branches which require result Moves operations with lower priority
41
41 Copyright CECS & The Spark Project Early Condition Execution Evaluates conditions ASAP Moves all unscheduled operations into conditionals Uses reverse speculation to achieve this
42
42 Copyright CECS & The Spark Project Conditional Speculation
43
43 Copyright CECS & The Spark Project RDLP Example A i=i+1 B j=i+h C k=i+g D l=j+1 A B : C D A D : A Original LoopCompact Shift and Pipeline A B : C D : A Unroll and compact B : C D
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.