Physical Synthesis Comes of Age Chuck Alpert, IBM Corp. Chris Chu, Iowa State University Paul Villarrubia, IBM Corp.
2 Physical Synthesis Family Tree Roles of layout as a parent: Clean up the mess created by physical synthesis (Implement the netlist generated by physical synthesis) Provide guidance to physical synthesis so that it will do things right Is layout mature enough to serve the role? Is there still room for layout to grow? SynthesisLayout Physical Synthesis
3 New Requirements of Placement 1. Super fast 4 to 8 million objects now Provide quick feedbacks to physical synthesis to refine the netlist 2. Stable in handling incremental placement Physical synthesis constantly makes changes to netlist 3. Flexible objective function Timing, Power, Routability 4. Handle mixed-size modules Hierarchical design and use of IP blocks are common
4 Placement As a Baby Simulated annealing based placement Popularized by Timberwolf [DAC-86] Greedy AlgorithmSimulated Annealing You only have 1 chance. If you get stuck, I will terminate you! OK to make mistakes. Keep trying! Evaluation/Feedback is important. Strength: Good quality for small designs Easy to consider different objective functions Handle incremental changes well Weakness: Very slow – crawling Non-trivial to handle modules of different sizes
5 Placement As a Kid Min-cut placement (or Partitioning-based placement) An old idea [Breuer, DAC-77] Capo [DAC-00] leverages breakthrough in partitioning using multi- level technique (e.g., hMetis [DAC-97], MLFM [DAC-97]) Dragon [ICCAD-00] combines hierarchical partitioning with annealing Strength: Efficient and scalable Very good wirelength, but can we do better? Weakness: More difficult to handle other objectives Not stable in handling incremental changes Not good in white space management Circuit PlacementRegion
6 White Space in Min-Cut Placement Capo (Min-Cut) adaptec2 HPWL=9955 APlace (Analytical) adaptec2 HPWL=8715 Courtesy: IBM
7 Placement Maturing Analytical placement Used by 4 of the top 5 placers in ISPD-05 Placement Contest and the top 5 placers in ISPD-06 Placement Contest Strength: Fastest and scalable Best wirelength Robust framework to incorporate different objectives and constraints Stable in handling incremental changes Good in white space management Why would analytical placement work so well? Can see the big picture Why was it not popular in the past? Hard to spread modules evenly in placement region
8 Attempt Still Relying on Partitioning Gordian: Global Optimization and Rectangle Dissection [TCAD-91] Artificial center of mass constraints disturb global optimal solution too drastically Centers of mass
9 Another Partitioning-based Spreading Quadratic optimization with quadrisection [Vygen, DAC-97] Courtesy: IBM
10 Spreading by Density-based Force Kraftwerk [DAC-98] Quadratic wirelength minimization: Spread cells by additional forces: Density-based force to push cells away from dense to sparse region Great idea: Spread cells smoothly Very good wirelength But not too fast: Constant force, hard to control convergence Density-based force expensive to compute x
11 Dramatic Speedup FastPlace [ISPD-04] repeat Solve quadratic program to minimize wirelength Spread the cells until cell distribution is roughly even Reduce wirelength by iterative heuristic Hybrid Net Model Speed up solving of QP Cell Shifting Simple technique to compute spreading force Fast convergence due to the use of pseudo-net [Hu et al., ISPD- 02] instead of constant force Iterative Local Refinement More efficient than using QP to refine the solution Minimize wirelength based on linear objective
12 Linearization of Quadratic Wirelength New Kraftwerk [ICCAD-06] BoundingBox net model for multi-pin nets: Need to know the outmost pins of a net Accurately models HPWL Faster and less memory than clique model Two fundamental components of spreading force: Hold force – Constant force Move force – Enforced by pseudo-net to fixed point BoundingBoxClique
13 Relaxation Rather than Linearization RQL [DAC-07] Force Vector Modulation to FastPlace framework Currently fastest and best wirelength Spreading Force Magnitude Module Index Rank Modules based on the spreading force magnitude Nullify the spreading force for top 5-10% of modules
14 An Alterative Analytical Approach APlace [ISPD-04], mPL5 [ISPD-05], NTUPlace3 [ICCAD-06] Log-sum-exponential function to approximate HPWL [Naylor et al., US Patent 2001] Density constraint is directed formulated into the objective function Very competitive wirelength and runtime APlaceNTUP3mPL6RQL Wirelength ModelLog-sum-exponentialQuadratic Spreading Force Density potential based Fixed-point based Bell-shaped Poisson smoothed Objective FunctionNon-linear & Non-convexQuadratic
15 Placement: Getting Old or Still Young? Better approach than quadratic / analytical approach? Massive parallelism to speed up placement Better clustering technique Marco placement / floorplanning True timing driven placement
16 Sufficient Parental Guidance? All physical synthesis gets from placement is distance info Physical synthesis has a distorted world view! Wirelength estimation is inaccurate (especially for nets with high pin count) Congestion estimation is inaccurate Area estimation is inaccurate Without buffering and gate sizing Timing estimation is very inaccurate S3 S2 S1 S0 T0T1T2T3 S3 S2 S1 S0 T0T1T2T3 S3 S2 S1 S0 T0T1T2T3 Routing of a BusA Simple SolutionProbablistic Estimation
17 Routing-Driven Physical Synthesis Need a more integrated approach Past: Placement-Driven Physical Synthesis Future: Routing-Driven Physical Synthesis Main obstacle: Runtime Two possibilities: 1. Construct Steiner trees to guide synthesis and placement 2. Perform global routing to guide synthesis and placement
18 Fast Steiner Tree Construction FLUTE (Fast LookUp Table Estimation) [ICCAD 04, ISPD 05] An extremely fast and accurate rectilinear Steiner Tree algorithm Very suitable for VLSI applications: Optimal up to degree 9, Very accurate up to degree 100 Over all 1.57 million nets in 18 IBM circuits [ISPD 98] RMST RSTT SPAN BGA BI1S FLUTE
19 Is Steiner Tree Sufficient? Steiner trees do not consider detour due to routing congestion or buffering congestion Can we predict the impact of congestion on routing? There is no way for generic estimators to accurately estimate congestion of arbitrary global routers! Labyrinth(70%)Labyrinth(50%)Chi Dispersion #cong #match#cong#match ibm ibm ibm ibm ibm ibm ibm ibm ibm match Congestion by router 1 Congestion by router 2
20 Traditional Global Routing Simultaneous approach (e.g., ILP) Very slow Sequential approach Net-by-net routing, Rip-up and Reroute Maze routing for a net: Lee’s, Dijkstra’s, A*-search algorithms Reasonably fast Reasonably good quality Is it good enough to handle the demand of physical synthesis?
21 Progresses in Global Routing Pattern Routing [Kastner et al., ICCAD-00] L-shaped, Z-shaped routes Faster Better cost functions for maze routing [Hadsell & Madden, DAC-03; Pan & Chu, ICCAD-06] Reduce overflow significantly Congestion-driven Steiner tree construction [Pan & Chu, ICCAD-06] Much faster because of much less reliance on maze routing Negotiated Congestion by PathFinder [FPGA-95] Used by BoxRouter [ICCAD-07], FGA [ICCAD-07], Archer [ICCAD-07] Excellent routing ability Very slow because it takes a long time to build congestion history Wanted: Techniques that are both fast and high quality
22 What Should We Do Next? Integration of global routing into placement An initial attempt: IPR [DAC-07] Integration of FastPlace, FastDP, FLUTE and FastRoute Significantly improves routability & wirelength in good runtime Incorporate buffering and gate sizing into integrated placement & routing Much more accurate timing information Should also help congestion and placement density control Integration with logic synthesis In other words, we need: Better basic algorithms – placement, Steiner tree, global routing, buffering, gate sizing, etc. Clever ways of integration It is a (EDA) family problem. Let’s work together!
Thank You