Download presentation
Presentation is loading. Please wait.
1
Layout Driven Data Communication Optimization for High Level Synthesis Ryan Kastner, Wenrui Gong, Xin Hao, Forrest Brewer Dept. of Electrical and Computer Engineering University of California, Santa Barbara Adam Kaplan, Philip Brisk and Majid Sarrafzadeh Computer Science Department University of California, Los Angeles
2
High Level Synthesis Input: Application description written in *C (C, SystemC, HandelC, SpecC) for (y_pos=ygrid_start-y_fmid-1,res_pos=0; y_pos<0; y_pos+=ygrid_step) { for (x_pos=xgrid_start-x_fmid-1; x_pos<0; x_pos+=xgrid_step,res_pos++) { (*reflect)(filt,x_fdim,y_fdim,x_pos, y_pos,temp,FILTER); sum=0.0; for (y_filt_lin=x_fdim,x_filt=y_im_lin=0; y_filt_lin<=filt_size; y_im_lin+=x_dim,y_filt_lin+=x_fdim) for (im_pos=y_im_lin; x_filt<y_filt_lin; x_filt++,im_pos++) sum+=image[im_pos]*temp[x_filt]; result[res_pos] = sum; } first_col = x_pos+1; (*reflect)(filt,x_fdim,y_fdim,0,y_pos,temp,FILTER); Internal filter of an image convolver SSA CDFG Maximize “ performance ” (area, latency, power, … ) subject to input constraints Output: “Hardware” (RTL Specification)
3
Target Architectures “Spatial” architectures Local control between data path, global data flow between control nodes Lots of distributed computational units, memory Coarse/fine grained reconfigurable architectures Techniques could be used for other architectures May not make sense Our design flow has little resource sharing Fine grain configurable platform Coarse grain programmable platform
4
Obligatory Design Flow Slide SUIF: Syntactic & Semantic Analysis Application Specification AST Machine SUIF: Compiler Backend SSA CDFG 4. Synthesize behavioral HDL code to RTL code Behavioral Synthesis Logical & Physical Synthesis 8. Synthesize RTL code Entity 1 Entity 3Entity 2 Entity 4 6. Determine structural control and data communication between basic block entities 7. Generate synthesizable RTL code CFG Entity 5. Create CFG interface entity cfg is … architecture behavioral of cfg … 2. Transform instruction list to dataflow graph 1. Create interface ++ + * * 3. Transform dataflow graph to behavioral HDL code Basic Block Entity entity basic_block is … architecture behavioral of basic_block … entity basic_block is
5
Design Example /* perform radix 4 iterations */ for(i = 1; i <= n4pow; i++) { nn *= 4; in = n / nn; FR4TR(in, nn, b, b + in, b + 2 * in, b + 3 * in); } /* perform inplace reordering */ FORD1(n2pow, b); FORD2(n2pow, b); /* take conjugates */ for(i = 3; i < n; i += 2) b[i] = -b[i]; return 1;} int FAST(real *b, int n) { real fn; int i, in, nn, n2pow, n4pow, nthpo; n2pow = fastlog2(n); if(n2pow <= 0) return 0; nthpo = n; fn = nthpo; n4pow = n2pow / 2; /* radix 2 iteration required; do it now */ if(n2pow % 2) { nn = 2; in = n / nn; FR2TR(in, b, b + in); } else nn = 1; Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 “FAST” function from MediaBench Some nodes missing - simple computation, merged into others Lines below show data communication
6
Characterizing Data Communication Examples of data communication schemes Control Node 3 Control Node 2 Control Node 4 Memory (Register Bank, RAM) Control Node 4 Control Node 2 Control Node 3 Bus DistributedCentralized Data communication = wireData communication = memory access
7
Identifying Data Communication Determine relationship between place(s) where data is defined and where data is used b … a … a a … c … b … b c Naïve method: all use-points of a variable depend on all definitions of that variable Not all use points “use” a variable Need analysis to minimize the amount of data communication Global Data Communication = 5 variables
8
Must determine relationship between where data is generated and where data is used Problem formulations [DAC03]: Minimize the total number of bits communicated between all pairs of control nodes Today: Minimize overall wirelength SSA (Static Single Assignment) Changes each variable to have a unique definition point Must add -nodes to merge definitions Use of SSA in Compilation b … a … a a … c … b … b c b 1 … a 2 … a 4 a 3 … a 1 … c 1 … b 2 … b 1 c 1 a 4 (a 2,a 3 )
9
SSA algorithms Find location of -nodes Rename variables Three main SSA algorithms Minimal, Pruned – Cytron et al. Semi-pruned – Briggs et al. Differ in number and location of - nodes Minimal – insert -nodes at iterated dominance frontier (IDF) Semi-pruned – insert -node at IDF if variable live outside some basic block Pruned – insert -node at IDF if variable live at that time SSA Fundamentals b 1 … a 2 … a 4 a 3 … a 1 … c 1 … b 2 … b 1 c 1 a 4 (a 2,a 3 ) c 2 (c 1 ) b 3 (b 1,b 2 )Minimal b 1 … a 2 … a 4 a 3 … a 1 … c 1 … b 2 … b 1 c 1 a 4 (a 2,a 3 ) b 3 (b 1,b 2 )Semi-Pruned b 1 … a 2 … a 4 a 3 … a 1 … c 1 … b 2 … b 1 c 1 a 4 (a 2,a 3 )Pruned
10
Results: SSA for Data Comm. Minimization Edge Weight w(i,j)– number of bits communicated from node i to j Total Edge Weight (TEW) - corresponds to amount of data communication “ MediaBench ” marks
11
Further Minimizing Data Communication Current SSA algorithms place -nodes temporally In software compilation, live ranges should be short Appropriate in hardware? Spatial -node distribution Temporal -node distribution b 1 … a 2 … a 4 a 3 … a 1 … c 1 … b 2 … b 1 c 1 a 4 (a 2,a 3 ) b 1 … a 2 … a 4 a 3 … a 1 … c 1 … b 2 … b 1 c 1 a 4 (a 2,a 3 ) TEW = 4 b 1 … a 2 … a 4 a 3 … a 1 … c 1 … b 2 … b 1 c 1 a 4 (a 2,a 3 ) TEW = 3
12
Spatial -nodes Distribution Algorithm d – number of uses of -node destination s – number of -node source values Number of temporal links Number of spatial links a 3 (a 0,a 1,a 2 ) a 3 s = 3 d = 2 Optimal assuming “ ideal ” n-dimensional floorplan
13
Physically Aware Compiler Transforms Consider layout information during compilation Modify transforms to consider physical info Ideal: full physical synthesis – extremely accurate, but way too time consuming Physical Synthesis Hardware Compilation application Floor- planner Approximate using floorplanning Much faster Gives “good enough” high level physical picture Our previous data comm. work No physical information Can lead to negative results Let ’ s Get Physical!
14
Physically Aware Data Communication Modify placement of Φ-functions to consider wirelength 1. Given a CFG G cfg (V cfg, E cfg ) 2. perform_ssa(G cfg ) 3. calculate_def_use_chains(G cfg ) 4. remove_back_edges(G cfg ) 5. topological_sort(G cfg ) 6. foreach vertex v V cfg 7. foreach -node v 8. s .sources 9. d |def_use_chain( .dest)| 10. IDF iterated_dominance_fronter(s) 11. PossiblePlacements findPlacementOptions(IDF) 12. place( ) selectBest(PossiblePlacements) 13. distribute/duplicate to place( ) -Placement Algorithm 1.Given a set of CFG Nodes R 2. -options 3. insert(R) into -options 4. foreach instruction i R 5. if( i is a destination of -function f ) 6. return -options 7. temp_ -options 8. foreach non-dominated child c of R 9. temp_ -options crossProductJoin(temp_ _options, findPlacementOptions(c)) 10. return -options temp_ -options FindPlacementOptions Algorithm
15
Algorithm in Action b 1 … a 2 … a 4 a 3 … a 1 … c 1 … b 2 … b 1 c 1 a 4 (a 2,a 3 ) Evaluate all options for -nodes Replicate when necessary Limit amount of replication - most often leads to more wirelength Can play tricks to limit redundant placements Traditional (temporal) Spatial [DAC03] Spatial [DAC03] Traditional (temporal) Any of these options could yield the best wirelength Highly dependent on the floorplan
16
Algorithm in Action FAST function from MediaBench testsuite F T T F N3 nn_4, i_2nn_5, i_3 N9
17
Algorithm in Action F T T F nn_4, i_2nn_5, i_3 N3 N9 F T T F N3 nn_4, i_2nn_5, i_3 N9
18
Physical Synthesis Hardware Compilation Full Floor- planner 1. Initial optimization minimizes data communication 2. Full SA based floorplanning 3. Reoptimization based to minimize floorplanning 4. Full SA based floorplanning Spectacularly negative results Full Floorplanning Results Simple iterative approach
19
Incremental Floorplanning Incremental Placement [Coudert et al]: Given an optimized placement and a set of changes to the netlist (e.g., due to technology remapping) modify the placement to improve it. Equally applicable to floorplanning 6 1 2 3 4 6 Initial Floorplan Modified Floorplan Perturbations 1 2 3 4 6 6 1 floorplan modules (e.g. due to -function movement) floorplan
20
1 2 3 4 6 6 | 2/2.3 - 9/10.1 - 11/12.4 - 16/18 - 5/5.6 - 27/30.4 - 32/36 - - 3 - 2 1 4 Incremental Floorplan Our Incremental Floorplanner Incremental Floorplanner 6 1 2 3 4 6 Initial Floorplan Modified Floorplan Perturbations 1 2 3 4 6
21
Our Incremental Floorplanner 1. Calculate area & room of each node: bottom up slicing tree traversal 2. Area redistribution Top down traversal Increase area if necessary Not enough space at root Aspect ratios become too distorted 1 2 3 4 6 6 | 2/2.3 - 9/10.1 - 11/12.4 - 16/18 - 5/5.6 - 27/30.4 - 32/36 - - 3 - 2 1 4 Incremental Floorplan Modified Floorplan 1 2 3 4 Simple, yet effective Other more complicated algorithms might work better
22
MediaBench Functions BenchmarkBlocks LinksWeightInitial WL 1 adpcm coder 333154268835568 2 adpcm decoder 262344195221588 3 internal filter 101436017088411637 4 Internal expand 1019425714336317031 5 compress output 341760236829114 6 mpeg2dec block 621366227234510 7 mpeg2dec vector 1642610244366 8FAST144157043714 9FR4TR7787155704340697 10det1251379363772
23
Incremental Floorplanning Results Normalized Wirelength Benchmarks “ Optimal ” Approach: 12% Overall Wirelength Reduction 25% Phi-node Wirelength Reduction Our Approach: 6% Overall Wirelength Reduction 8% Phi-node Wirelength Reduction avg
24
Related Work Hardware compilation projects using SSA PDG+SSA form [UCSB] CASH [CMU] SA-C [UCR] Sea Cucumber [BYU] Physically aware behavioral synthesis techniques SA for scheduling, binding and floorplanning [Prabhakaran97] SA for binding and floorplanning [Yung-Ming94] Scheduling, allocation and binding [Dougherty00] Fasolt: bus topology [Knapp92] High level synthesis [Tarafdar00] Incremental CAD Problem overview/challenges [Coudert00] Floorplanning [Crenshaw99]
25
Conclusions It’s been a long strange trip… SSA a nice IR for hardware compilation Explicitly shows data flow Useful for exploiting parallelism Compiler techniques applied to hardware design can reduce wirelength They must be aware of physical information They must use an incremental floorplanning
26
Questions? (and cue for applause)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.