Presentation is loading. Please wait.

Presentation is loading. Please wait.

High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida.

Similar presentations


Presentation on theme: "High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida."— Presentation transcript:

1 High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

2 Existing FPGA Tool Flow Register-transfer (RT) synthesis Specify RT structure (muxes, registers, etc) + Allows precise specification - But, time consuming, difficult, error prone HDL Netlist Bitfile Processor FPGA RT Synthesis Physical Design Technology Mapping Placement Routing

3 Future FPGA Tool Flow? HDL Netlist Bitfile Processor FPGA RT Synthesis Physical Design Technology Mapping Placement Routing High-level Synthesis C/C++, Java, etc.

4 High-level Synthesis Wouldn’t it be nice to write high-level code? Ratio of C to VHDL developers (10000:1 ?) + Easier to specify + Separates function from architecture + More portable - Hardware potentially slower Similar to assembly code era Programmers could always beat compiler But, no longer the case Hopefully, high-level synthesis will catch up to manual effort

5 High-level Synthesis More challenging than compilation Compilation maps behavior into assembly instructions Architecture is known to compiler High-level synthesis creates a custom architecture to execute behavior Huge hardware exploration space Best solution may include microprocessors Should handle any high-level code Not all code appropriate for hardware

6 High-level Synthesis First, consider how to manually convert high-level code into circuit Steps 1) Build FSM for controller 2) Build datapath based on FSM acc = 0; for (i=0; i < 128; i++) acc += a[i];

7 Manual Example Build a FSM (controller) Decompose code into states acc = 0; for (i=0; i < 128; i++) acc += a[i]; if (i < 128) acc=0, i = 0 load a[i] acc += a[i] i++ Done

8 Manual Example Build a datapath Allocate resources for each state acc i if (i < 128) acc=0, i = 0 load a[i] acc += a[i] i++ Done < addr a[i] + + + 1 128 1 acc = 0; for (i=0; i < 128; i++) acc += a[i];

9 Manual Example Build a datapath Determine register inputs acc i if (i < 128) acc=0, i = 0 load a[i] acc += a[i] i++ Done < addr a[i] + + + 1 128 2x1 0 0 1 &a In from memory acc = 0; for (i=0; i < 128; i++) acc += a[i];

10 Manual Example Build a datapath Add outputs acc i if (i < 128) acc=0, i = 0 load a[i] acc += a[i] i++ Done < addr a[i] + + + 1 128 2x1 0 0 1 &a In from memory Memory address acc acc = 0; for (i=0; i < 128; i++) acc += a[i];

11 Manual Example Build a datapath Add control signals acc i if (i < 128) acc=0, i = 0 load a[i] acc += a[i] i++ Done < addr a[i] + + + 1 128 2x1 0 0 1 &a In from memory Memory address acc acc = 0; for (i=0; i < 128; i++) acc += a[i];

12 Manual Example Combine controller+datapath acc i < addr a[i] + + + 1 128 2x1 0 0 1 &a In from memory Memory address acc Done Memory Read Controller acc = 0; for (i=0; i < 128; i++) acc += a[i];

13 Manual Example Alternatives Use one adder (plus muxes) acc i < addr a[i] + 128 2x1 0 0 1 &a In from memory Memory address acc MUX

14 Manual Example Comparison with high-level synthesis Determining when to perform each operation => Scheduling Allocating resource for each operation => Resource allocation Mapping operations onto resources => Binding

15 Another Example Your turn x=0; for (i=0; i < 100; i++) { if (a[i] > 0) x ++; else x --; a[i] = x; } //output x Steps 1) Build FSM (do not perform if conversion) 2) Build datapath based on FSM

16 High-Level Synthesis High-level Code Custom Circuit High-Level Synthesis Could be C, C++, Java, Perl, Python, SystemC, ImpulseC, etc. Usually a RT VHDL description, but could as low level as a bit file

17 High-Level Synthesis acc = 0; for (i=0; i < 128; i++) acc += a[i]; acc i < addr a[i] + + + 1 128 2x1 0 0 1 &a In from memory Memory address acc Done Memory Read Controller

18 Main Steps Syntactic Analysis Optimization Scheduling Resource Allocation Binding High-level Code Intermediate Representation Controller + Datapath Converts code to intermediate representation - allows all following steps to use common format. Determines when each operation will execute Determines physical resources to be used Maps operations onto physical resources Front-end Back-end

19 Syntactic Analysis Definition: Analysis of code to verify syntactic correctness Converts code into intermediate representation 2 steps 1) Lexical analysis (Lexing) 2) Parsing Syntactic Analysis High-level Code Intermediate Representation Lexical Analysis Parsing

20 Lexical Analysis Lexical analysis (lexing) breaks code into a series of defined tokens Token: defined language constructs x = 0; if (y < z) x = 1; Lexical Analysis ID(x), ASSIGN, INT(0), SEMICOLON, IF, LPAREN, ID(y), LT, ID(z), RPAREN, ID(x), ASSIGN, INT(1), SEMICOLON

21 Lexing Tools Define tokens using regular expressions - outputs C code that lexes input Common tool is “lex” /* braces and parentheses */ "[" { YYPRINT; return LBRACE; } "]" { YYPRINT; return RBRACE; } "," { YYPRINT; return COMMA; } ";" { YYPRINT; return SEMICOLON; } "!" { YYPRINT; return EXCLAMATION; } "{" { YYPRINT; return LBRACKET; } "}" { YYPRINT; return RBRACKET; } "-" { YYPRINT; return MINUS; } /* integers [0-9]+ { yylval.intVal = atoi( yytext ); return INT; }

22 Parsing Analysis of token sequence to determine correct grammatical structure Languages defined by context-free grammar Program = Exp Exp = Stmt SEMICOLON | IF LPAREN Cond RPAREN Exp | Exp Exp Cond = ID Comp ID Stmt = ID ASSIGN INT x = 0; if (y < z) x = 1; x = 0; x = 0; y = 1; if (a < b) x = 10; if (var1 != var2) x = 10; x = 0; if (y < z) x = 1; y = 5; t = 1; Grammar Correct Programs Comp = LT | NE

23 Parsing Program = Exp Exp = S SEMICOLON | IF LPAREN Cond RPAREN Exp | Exp Exp Cond = ID Comp ID S = ID ASSIGN INT Grammar Comp = LT | NE x = y; x = 3 + 5; x = 5;; if (x+5 > y) x = 2; Incorrect Programs x = 5

24 Parsing Tools Define grammar in special language Automatically creates parser based on grammar Popular tool is “yacc” - yet-another-compiler- compiler program: functions { $$ = $1; } ; functions: function { $$ = $1; } | functions function { $$ = $1; } ; function: HEXNUMBER LABEL COLON code { $$ = $2; } ;

25 Intermediate Representation Parser converts tokens to intermediate representation Usually, an abstract syntax tree x = 0; if (y < z) x = 1; d = 6; Assign if cond assign x 0 x 1 d 6 y < z

26 Intermediate Representation Why use intermediate representation? Easier to analyze/optimize than source code Theoretically can be used for all languages Makes synthesis back end language independent Syntactic Analysis C Code Intermediate Representation Syntactic Analysis Java Syntactic Analysis Perl Back End Scheduling, resource allocation, binding, independent of source language - sometimes optimizations too

27 Intermediate Representation Different Types Abstract Syntax Tree Control/Data Flow Graph (CDFG) Sequencing Graph Etc. We will focus on CDFG Combines control flow graph (CFG) and data flow graph (DFG)

28 Control flow graphs CFG Represents control flow dependencies of basic blocks Basic block is section of code that always executes from beginning to end I.e. no jumps into or out of block acc = 0; for (i=0; i < 128; i++) acc += a[i]; if (i < 128) acc=0, i = 0 acc += a[i] i ++ Done

29 Control flow graphs Your turn Create a CFG for this code i = 0; while (j < 10) { if (x < 5) y = 2; else if (z < 10) y = 6; }

30 Data Flow Graphs DFG Represents data dependencies between operations x = a+b; y = c*d; z = x - y; + * - a b c d x y z

31 Control/Data Flow Graph Combines CFG and DFG Maintains DFG for each node of CFG acc = 0; for (i=0; i < 128; i++) acc += a[i]; if (i < 128) acc=0; i=0; acc += a[i] i ++ Done acc 0 i 0 + a[i] acc + i 1 i

32 High-Level Synthesis: Optimization

33 Synthesis Optimizations After creating CDFG, high-level synthesis optimizes graph Goals Reduce area Improve latency Increase parallelism Reduce power/energy 2 types Data flow optimizations Control flow optimizations

34 Data Flow Optimizations Tree-height reduction Generally made possible from commutativity, associativity, and distributivity + + + + + + a b cd a b cd + + * a b cd + * + a b cd

35 Data Flow Optimizations Operator Strength Reduction Replacing an expensive (“strong”) operation with a faster one Common example: replacing multiply/divide with shift b[i] = a[i] * 8; b[i] = a[i] << 3; a = b * 5; c = b << 2; a = b + c; 1 multiplication 0 multiplications a = b * 13; c = b << 2; d = b << 3; a = c + d + b;

36 Data Flow Optimizations Constant propagation Statically evaluate expressions with constants x = 0; y = x * 15; z = y + 10; x = 0; y = 0; z = 10;

37 Data Flow Optimizations Function Specialization Create specialized code for common inputs Treat common inputs as constants If inputs not known statically, must include if statement for each call to specialized function int f (int x) { y = x * 15; return y + 10; } for (I=0; I < 1000; I++) f(0); … } int f_opt () { return 10; } for (I=0; I < 1000; I++) f_opt(0); … } Treat frequent input as a constant int f (int x) { y = x * 15; return y + 10; }

38 Data Flow Optimizations Common sub-expression elimination If expression appears more than once, repetitions can be replaced a = x + y;...... b = c * 25 + x + y; a = x + y;...... b = c * 25 + a; x + y already determined

39 Data Flow Optimizations Dead code elimination Remove code that is never executed May seem like stupid code, but often comes from constant propagation or function specialization int f (int x) { if (x > 0 ) a = b * 15; else a = b / 4; return a; } int f_opt () { a = b * 15; return a; } Specialized version for x > 0 does not need else branch - “dead code”

40 Data Flow Optimizations Code motion (hoisting/sinking) Avoid repeated computation for (I=0; I < 100; I++) { z = x + y; b[i] = a[i] + z ; } z = x + y; for (I=0; I < 100; I++) { b[i] = a[i] + z ; }

41 Control Flow Optimizations Loop Unrolling Replicate body of loop May increase parallelism for (i=0; i < 128; i++) a[I] = b[I] + c[I]; for (i=0; i < 128; i+=2) { a[I] = b[I] + c[I]; a[I+1] = b[I+1] + c[I+1] }

42 Control Flow Optimizations Function Inlining Replace function call with body of function Common for both SW and HW SW - Eliminates instructions/branches HW - Eliminates unnecessary control states for (i=0; i < 128; i++) a[I] = f( b[I], c[I] );.. int f (int a, int b) { return a + b * 15; } for (i=0; i < 128; i++) a[I] = b[I] + c[I] * 15;

43 Control Flow Optimizations Conditional Expansion Replace if with logic expression Execute if/else bodies in parallel y = ab; If (a) x = b+d; Else x =bd; y = ab; x = a(b+d) + a’bd; y = ab; x = y + d(a+b); [DeMicheli] Can be further optimized to:

44 Example Optimize this x = 0; y = a + b; if (x < 15) z = a + b - c; else z = x + 12; output = z * 12;

45 High-Level Synthesis: Scheduling

46 Scheduling Scheduling assigns a start time to each operation Start times must not violate dependencies in DFG Start times must meet performance constraints Alternatively, resource constraints

47 Examples + + + + + + a b cd a b cd + + + a b cd Cycle1 Cycle2 Cycle3 Cycle1 Cycle2 Cycle1 Cycle2

48 Scheduling Problems Several types of scheduling problems Usually some combination of performance and resource constraints Problems: Unconstrained Not very useful, every schedule is valid Minimum latency Latency constrained Mininum-latency, resource constrained i.e. Find the scheduling with the shortest latency, that uses less than a specified # of resources NP-Complete Mininum-resource, latency constrained i.e. find the schedule that meets the latency constraint (which may be anything), and uses the minimum # of resources NP-Complete

49 Minimum Latency Scheduling ASAP (as soon as possible) algorithm Find a node whose predecessors are scheduled Schedule node one cycle later than max cycle of predecessor Repeat until all nodes scheduled + + * a b cd * - < e f g h Cycle1 Cycle2 Cycle3 + Cycle4 Minimum possible latency - 4 cycles

50 Minimum Latency Scheduling ALAP (as late as possible) algorithm Run ASAP, get minimum latency L Find a node whose successors are scheduled Schedule node one cycle before than min cycle of predecessor Nodes with no successors scheduled to cycle L Repeat until all nodes scheduled + + * a b cd * - < e f g h Cycle1 Cycle2 Cycle3 + Cycle4 Cycle3 *

51 Minimum Latency Scheduling ALAP Has to run ASAP first, seems pointless But, many heuristics need the mobility/slack of each operation ASAP gives the earliest possible time for an operation ALAP gives the latest possible time for an operation Slack = difference between earliest and latest possible schedule Slack = 0 implies operation has to be done in the current scheduled cycle The larger the slack, the more options a heuristic has to schedule the operation

52 Latency-Constrained Scheduling Instead of finding the minimum latency, find latency less than L Solutions: Use ASAP, verify that minimum latency less than L Use ALAP starting with cycle L instead of minimum latency (don’t need ASAP)


Download ppt "High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida."

Similar presentations


Ads by Google