Presentation is loading. Please wait.

Presentation is loading. Please wait.

Physically Aware Data Communication Optimization for Hardware Synthesis Ryan Kastner, Wenrui Gong, Xin Hao, Forrest Brewer Dept. of Electrical and Computer.

Similar presentations


Presentation on theme: "Physically Aware Data Communication Optimization for Hardware Synthesis Ryan Kastner, Wenrui Gong, Xin Hao, Forrest Brewer Dept. of Electrical and Computer."— Presentation transcript:

1 Physically Aware Data Communication Optimization for Hardware Synthesis Ryan Kastner, Wenrui Gong, Xin Hao, Forrest Brewer Dept. of Electrical and Computer Engineering University of California, Santa Barbara Adam Kaplan, Philip Brisk and Majid Sarrafzadeh Computer Science Department University of California, Los Angeles

2 Hardware Compilation Application specified in high level language Compiler SynthesisandPhysicalDesign HDL(behavioral,structural)  We focus our efforts on mapping an application written in a high-level language to a hardware description  We desire this mapping to have optimal characteristics (area, latency, etc.)  In this talk, we focus on the problem of minimizing data communication in the final hardware Chip, bitstream, …

3 Obligatory Design Flow Slide SUIF: Syntactic & Semantic Analysis Application Specification AST Machine SUIF: Compiler Backend SSA CDFG 4. Synthesize behavioral HDL code to RTL code Behavioral Synthesis Logical & Physical Synthesis 8. Synthesize RTL code Entity 1 Entity 3Entity 2 Entity 4 6. Determine structural control and data communication between basic block entities 7. Generate synthesizable RTL code CFG Entity 5. Create CFG interface entity cfg is … architecture behavioral of cfg … 2. Transform instruction list to dataflow graph 1. Create interface ++ + * * 3. Transform dataflow graph to behavioral HDL code Basic Block Entity entity basic_block is … architecture behavioral of basic_block …

4 Characterizing Data Communication  Examples of data communication schemes Control Node 1 Control Node 3 Control Node 2 Control Node 4 Memory (Register Bank, RAM) Control Node 4 Control Node 2 Control Node 3 Control Node 1 Bus DistributedCentralized Data communication = wire Data communication = memory access

5 Identifying Data Communication  Determine relationship between place(s) where data is defined and where data is used b  … a  …  a a  … c  … b  …  b  c  Naïve method: all use-points of a variable depend on all definitions of that variable  Not all use points “use” a variable Need analysis to minimize the amount of data communication  Global Data Communication = 5 variables

6 Use of SSA in Compilation b  … a  …  a a  … c  … b  …  b  c b 1  … a 2  …  a 4 a 3  … a 1  … c 1  … b 2  …  b 1  c 1 a 4   (a 2,a 3 )  Must determine relationship between where data is generated and where data is used  Problem formulations  [DAC02]: Minimize the total number of bits communicated between all pairs of control nodes  Today: Minimize overall wirelength  SSA (Static Single Assignment)  Changes each variable to have a unique definition point  Must add  -nodes to merge definitions

7 Physically Aware Compiler Transforms  Consider layout information during compilation  Modify transforms to consider physical info  Ideal: full physical synthesis – extremely accurate, but way too time consuming Physical Synthesis Hardware Compilation application Floor- planner  Approximate using floorplanning  Much faster  Gives “good enough” high level physical picture  Previous data communication work  No physical information  Can lead to negative results Let ’ s Get Physical!

8 Physically Aware Data Communication  Modify placement of Φ-functions to consider wirelength 1. Given a CFG G cfg (V cfg, E cfg ) 2. perform_ssa(G cfg ) 3. calculate_def_use_chains(G cfg ) 4. remove_back_edges(G cfg ) 5. topological_sort(G cfg ) 6. foreach vertex v  V cfg 7. foreach  -node   v 8. s  .sources 9. d  |def_use_chain( .dest)| 10. IDF  iterated_dominance_fronter(s) 11. PossiblePlacements  findPlacementOptions(IDF) 12. place(  )  selectBest(PossiblePlacements) 13. distribute/duplicate  to place(  )   -Placement Algorithm 1.Given a set of CFG Nodes R 2.  -options   3. insert(R) into  -options 4. foreach instruction i  R 5. if( i is a destination of  -function f ) 6. return  -options 7. temp_  -options   8. foreach non-dominated child c of R 9. temp_  -options  crossProductJoin(temp_  _options, findPlacementOptions(c)) 10. return  -options  temp_  -options FindPlacementOptions Algorithm

9 Algorithm in Action  FAST function from MediaBench testsuite F T T F N3 nn_4, i_2nn_5, i_3 N9

10 Algorithm in Action i F T T F nn_4, i_2nn_5, i_3 N3 N9 F T T F N3 nn_4, i_2nn_5, i_3 N9

11 Physical Synthesis Hardware Compilation Full Floor- planner 1. Initial optimization minimizes data communication 2. Full SA based floorplanning 3. Reoptimization based to minimize floorplanning 4. Full SA based floorplanning Spectacularly negative results Full Floorplanning Results  Simple iterative approach

12 Incremental Floorplanning  Incremental Placement [Coudert et al]:  Given an optimized placement and a set of changes to the netlist (e.g., due to technology remapping) modify the placement to improve it.  Equally applicable to floorplanning 6 1 2 3 4 6 Initial Floorplan Modified Floorplan Perturbations 1 2 3 4 6 6 1 floorplan modules (e.g. due to  -function movement) floorplan

13 1 2 3 4 6 6 | 2/2.3 - 9/10.1 - 11/12.4 - 16/18 - 5/5.6 - 27/30.4 - 32/36 - - 3 - 2 1 4 Incremental Floorplan Our Incremental Floorplanner Incremental Floorplanner 6 1 2 3 4 6 Initial Floorplan Modified Floorplan Perturbations 1 2 3 4 6

14 Our Incremental Floorplanner 1. Calculate area & room of each node: bottom up slicing tree traversal 2. Area redistribution  Top down traversal  Increase area if necessary  Not enough space at root  Aspect ratios become too distorted 1 2 3 4 6 6 | 2/2.3 - 9/10.1 - 11/12.4 - 16/18 - 5/5.6 - 27/30.4 - 32/36 - - 3 - 2 1 4 Incremental Floorplan Modified Floorplan 1 2 3 4 Simple, yet effective Other more complicated algorithms might work better

15 MediaBench Functions BenchmarkBlocks  LinksWeightInitial WL 1 adpcm coder 333154268835568 2 adpcm decoder 262344195221588 3 internal filter 101436017088411637 4 Internal expand 1019425714336317031 5 compress output 341760236829114 6 mpeg2dec block 621366227234510 7 mpeg2dec vector 1642610244366 8FAST144157043714 9FR4TR7787155704340697 10det1251379363772

16 Incremental Floorplanning Results Normalized Wirelength Benchmarks “ Optimal ” Approach: 12% Overall Wirelength Reduction 25% Phi-node Wirelength Reduction Our Approach: 6% Overall Wirelength Reduction 8% Phi-node Wirelength Reduction

17 Related Work  Hardware compilation projects using SSA  PDG+SSA form [UCSB]  CASH [CMU]  SA-C [UCR]  Sea Cucumber [BYU]  Physically aware behavioral synthesis techniques  SA for scheduling, binding and floorplanning [Prabhakaran97]  SA for binding and floorplanning [Yung-Ming94]  Scheduling, allocation and binding [Dougherty00]  Fasolt: bus topology [Knapp92]  High level synthesis [Tarafdar00]  Incremental CAD  Problem overview/challenges [Coudert00]  Floorplanning [Crenshaw99]

18 Conclusions  It’s been a long strange trip…  SSA a nice IR for hardware compilation  Explicitly shows data flow  Useful for exploiting parallelism  Compiler techniques applied to hardware design can reduce wirelength  They must be aware of physical information  They must use an incremental floorplanning


Download ppt "Physically Aware Data Communication Optimization for Hardware Synthesis Ryan Kastner, Wenrui Gong, Xin Hao, Forrest Brewer Dept. of Electrical and Computer."

Similar presentations


Ads by Google