Physically Aware Data Communication Optimization for Hardware Synthesis Ryan Kastner, Wenrui Gong, Xin Hao, Forrest Brewer Dept. of Electrical and Computer Engineering University of California, Santa Barbara Adam Kaplan, Philip Brisk and Majid Sarrafzadeh Computer Science Department University of California, Los Angeles
Hardware Compilation Application specified in high level language Compiler SynthesisandPhysicalDesign HDL(behavioral,structural) We focus our efforts on mapping an application written in a high-level language to a hardware description We desire this mapping to have optimal characteristics (area, latency, etc.) In this talk, we focus on the problem of minimizing data communication in the final hardware Chip, bitstream, …
Obligatory Design Flow Slide SUIF: Syntactic & Semantic Analysis Application Specification AST Machine SUIF: Compiler Backend SSA CDFG 4. Synthesize behavioral HDL code to RTL code Behavioral Synthesis Logical & Physical Synthesis 8. Synthesize RTL code Entity 1 Entity 3Entity 2 Entity 4 6. Determine structural control and data communication between basic block entities 7. Generate synthesizable RTL code CFG Entity 5. Create CFG interface entity cfg is … architecture behavioral of cfg … 2. Transform instruction list to dataflow graph 1. Create interface ++ + * * 3. Transform dataflow graph to behavioral HDL code Basic Block Entity entity basic_block is … architecture behavioral of basic_block …
Characterizing Data Communication Examples of data communication schemes Control Node 1 Control Node 3 Control Node 2 Control Node 4 Memory (Register Bank, RAM) Control Node 4 Control Node 2 Control Node 3 Control Node 1 Bus DistributedCentralized Data communication = wire Data communication = memory access
Identifying Data Communication Determine relationship between place(s) where data is defined and where data is used b … a … a a … c … b … b c Naïve method: all use-points of a variable depend on all definitions of that variable Not all use points “use” a variable Need analysis to minimize the amount of data communication Global Data Communication = 5 variables
Use of SSA in Compilation b … a … a a … c … b … b c b 1 … a 2 … a 4 a 3 … a 1 … c 1 … b 2 … b 1 c 1 a 4 (a 2,a 3 ) Must determine relationship between where data is generated and where data is used Problem formulations [DAC02]: Minimize the total number of bits communicated between all pairs of control nodes Today: Minimize overall wirelength SSA (Static Single Assignment) Changes each variable to have a unique definition point Must add -nodes to merge definitions
Physically Aware Compiler Transforms Consider layout information during compilation Modify transforms to consider physical info Ideal: full physical synthesis – extremely accurate, but way too time consuming Physical Synthesis Hardware Compilation application Floor- planner Approximate using floorplanning Much faster Gives “good enough” high level physical picture Previous data communication work No physical information Can lead to negative results Let ’ s Get Physical!
Physically Aware Data Communication Modify placement of Φ-functions to consider wirelength 1. Given a CFG G cfg (V cfg, E cfg ) 2. perform_ssa(G cfg ) 3. calculate_def_use_chains(G cfg ) 4. remove_back_edges(G cfg ) 5. topological_sort(G cfg ) 6. foreach vertex v V cfg 7. foreach -node v 8. s .sources 9. d |def_use_chain( .dest)| 10. IDF iterated_dominance_fronter(s) 11. PossiblePlacements findPlacementOptions(IDF) 12. place( ) selectBest(PossiblePlacements) 13. distribute/duplicate to place( ) -Placement Algorithm 1.Given a set of CFG Nodes R 2. -options 3. insert(R) into -options 4. foreach instruction i R 5. if( i is a destination of -function f ) 6. return -options 7. temp_ -options 8. foreach non-dominated child c of R 9. temp_ -options crossProductJoin(temp_ _options, findPlacementOptions(c)) 10. return -options temp_ -options FindPlacementOptions Algorithm
Algorithm in Action FAST function from MediaBench testsuite F T T F N3 nn_4, i_2nn_5, i_3 N9
Algorithm in Action i F T T F nn_4, i_2nn_5, i_3 N3 N9 F T T F N3 nn_4, i_2nn_5, i_3 N9
Physical Synthesis Hardware Compilation Full Floor- planner 1. Initial optimization minimizes data communication 2. Full SA based floorplanning 3. Reoptimization based to minimize floorplanning 4. Full SA based floorplanning Spectacularly negative results Full Floorplanning Results Simple iterative approach
Incremental Floorplanning Incremental Placement [Coudert et al]: Given an optimized placement and a set of changes to the netlist (e.g., due to technology remapping) modify the placement to improve it. Equally applicable to floorplanning Initial Floorplan Modified Floorplan Perturbations floorplan modules (e.g. due to -function movement) floorplan
| 2/ / / /18 - 5/ / / Incremental Floorplan Our Incremental Floorplanner Incremental Floorplanner Initial Floorplan Modified Floorplan Perturbations
Our Incremental Floorplanner 1. Calculate area & room of each node: bottom up slicing tree traversal 2. Area redistribution Top down traversal Increase area if necessary Not enough space at root Aspect ratios become too distorted | 2/ / / /18 - 5/ / / Incremental Floorplan Modified Floorplan Simple, yet effective Other more complicated algorithms might work better
MediaBench Functions BenchmarkBlocks LinksWeightInitial WL 1 adpcm coder adpcm decoder internal filter Internal expand compress output mpeg2dec block mpeg2dec vector FAST FR4TR det
Incremental Floorplanning Results Normalized Wirelength Benchmarks “ Optimal ” Approach: 12% Overall Wirelength Reduction 25% Phi-node Wirelength Reduction Our Approach: 6% Overall Wirelength Reduction 8% Phi-node Wirelength Reduction
Related Work Hardware compilation projects using SSA PDG+SSA form [UCSB] CASH [CMU] SA-C [UCR] Sea Cucumber [BYU] Physically aware behavioral synthesis techniques SA for scheduling, binding and floorplanning [Prabhakaran97] SA for binding and floorplanning [Yung-Ming94] Scheduling, allocation and binding [Dougherty00] Fasolt: bus topology [Knapp92] High level synthesis [Tarafdar00] Incremental CAD Problem overview/challenges [Coudert00] Floorplanning [Crenshaw99]
Conclusions It’s been a long strange trip… SSA a nice IR for hardware compilation Explicitly shows data flow Useful for exploiting parallelism Compiler techniques applied to hardware design can reduce wirelength They must be aware of physical information They must use an incremental floorplanning