Multi-objective Placement Optimization for High-performance Nanoscale Integrated Circuits Igor L. Markov August 20, 2012.

Multi-objective Placement Optimization for High-performance Nanoscale Integrated Circuits Igor L. Markov August 20, 2012

A Traditional VLSI Design Flow System Specification Functional Design (HDL) Logic Synthesis and Technology Mapping Physical Design Fabrication Packaging and Testing Chip module foo(a,b,c,o1,o2); input a, b, c; output o1, o2; reg o2; assign o1 = a & b; always @(a, b, c) o2= a | c; RTL … Partitioning Chip Floorplanning Clock Network Synthesis Detailed Routing Placement Global Routing Placement Global Routing Timing Closure? Timing Opt. Transforms

Global Placement: Motivation ■ Interconnect lagging in performance while transistors continue scaling − Circuit delay, power dissipation and area dominated by interconnect − Routing quality highly controlled by placement ■Placement remains one of the most influential optimizations −Attracted attention from both industry and academia ISPD wirelength-driven placement contests [2005][2006] ISPD and DAC routability-driven placement contests [2011][2012] −A consistent WL improvement > 2% in placement is considered significant Unloaded Coupling IR drop RC delay

Prior Work in Interconnect-driven Analytical Placement ■ Ideal Placer − Fast runtime without sacrificing solution quality − Simplicity and easy integration with other optimization Speed Solution Quality Non-convex optimization mFAR, Kraftwerk2, FastPlace3, RQL Ideal placer mPL6, APlace2, NTUPlace3 Quadratic and force-directed

The SimPL Family of Placement Algorithms SimPL ICCAD`10, TCAD`12 SimPL ICCAD`10, TCAD`12 ComPLx DAC`12 ComPLx DAC`12 A common mathematical foundation SimPLR ICCAD`11 SimPLR ICCAD`11 Ripple ICCAD`11 Ripple ICCAD`11 MAPLE ISPD`12 MAPLE ISPD`12 Lopper ISPD`11 Lopper ISPD`11 PADÉ DAC`12 PADÉ DAC`12 SAPT ISPD`12 SAPT ISPD`12 NCTU@ ASPDAC`12 NCTU@ ASPDAC`12 Routability Clock-tree codesign and power optim’n Multilevel optimizat’n Datapath awareness Thermal awareness

ComPLx : a Competitive Primal-Dual Lagrange Optimization [DAC 2012] ■ Analysis and comparisons of placement algorithms have been mostly empirical, with little formal justification − Generalizes SimPL to handle arbitrary interconnect models − Illustrates how to add new constraints − Extends to macro and timing-driven placement ■ A projected subgradient primal-dual Lagrange optimization for global placement − Decomposes the original non-convex problem into “more convex” sub-problems − Lends mathematical substantiation for placement algorithms derived from SimPL

Revised Formulation of Placement ■ Given a netlist N and net weights w i,j ■ Objective: Half-Perimeter Wirelength (wHPWL) ■ Sample intermediate objective: quadratic approximation of HPWL ■ Constraints in placement − Resource-type constraints: Legality, target utilization, routability, etc. − Other constraints: Region, alignment, power density, thermal

no yes ComPLx: Overall Flow Initial HPWL Optimization Legalization Detailed Placement Can consider: Legality, routability, region, resource-type constraints Global Placement Iterations Unconstrained optimization Can be modified to consider: timing/power-criticality

Review: Lagrangian Relaxation ■ Given: optimization problem with constraints a)Convert constraints to penalties b)Add penalties to original objective –New variable for each penalty: Lagrange multiplier λ c)Solving an unconstrained problem solves the orig. problem

Review: Projected Subgradient Methods ■ Solve constrained optimization problem minimize f(x) where f : R n → R subject to x ϵ C where C ⊆ R n ■ Projected subgradient method iterates x (k+1) = P (x (k) – a k g (k) ), P is a projection onto C (feasible solutions), and g (k) is a subgradient of f(x (k) )

Converting Constraints to Penalties ■ Challenge: Working with supply-demand inequalities directly is difficult because they are specified algorithmically, not as closed-form expressions in (x, y) ■ Our solution: Work with subgradients, pointing to a closest C-feasible solution, found by a feasibility projection We approximate the penalty term by L 1 -distance from (x, y) to a closest C-feasible solution ( when Φ represents HPWL, λ remains dimensionless)

Feasibility Projection in ComPLx ■ Purpose: Approximating the penalty term allows one to replace the nonconvex Lagrangian by a convex one ■ We define the feasibility projection to find a closest C-feasible approximation (pseudo-legalization) a. Feasibility projection is C-feasible b. is Lipschitz continuous must generally decrease, providing upper bounds on final placement cost

ComPLx: Feasibility Projection on adaptec1 Cells are spread over the region’s expanse avoiding obstacles while minimizing displacement for a given solution

ComPLx: Primal-dual Lagrangian Relaxation ■ Alternates minimization over the primal variables (x, y) with maximization over the dual variable λ ■ subject to can be found by sequential unconstrained optimization where − Subsequent iterations increase the sensitivity of Lagrangian to the penalty − The minimization of affects more, and the penalty decreases while increases over iterations

ComPLx: Progression of Key Quantities

Unconstrained Optimization in ComPLx ■ The minimization of Lagrangian − After finding C-feasible anchor locations − The simplified Lagrangian can be minimized with respect to fixed and by solving for  A system of linear equation for quadratic ■ The ComPLx framework re-solves and until convergence

Convergence of ComPLx ■ Controlling Lagrange multipliers − for − ■ Convergence criteria − L 1 -distance to feasibility projection stops decreasing − Duality gap becomes small enough

ComPLx: Closing Gap between Two Bounds ■ Feasibility projection provides upper-bounds ■ Legal solution is formed between two bounds Upper-bounds found by feasibility projection Lower-bounds found by minimization of Lagrangian

ComPLx Iterations on adaptec1 (1) Iteration=0 (Init WL Opt.)Iteration=1 (Upper Bound) Iteration=2 (Lower Bound)Iteration=3 (Upper Bound) Fixed macros

ComPLx Iterations on adaptec1 (2) Iteration=11 (Upper Bound) Iteration=20 (Lower Bound)Iteration=21 (Upper Bound) Iteration=11 (Upper Bound) Iteration=20 (Lower Bound)Iteration=21 (Upper Bound) Iteration=10 (Lower Bound) Fixed macros

ComPLx Iterations on adaptec1 (3) Iteration=31 (Upper Bound)Iteration=30 (Lower Bound) Iteration=40 (Lower Bound)Iteration=41 (Upper Bound) Fixed macros

Motivation for Macro Placement ■ The traditional “sea-of-gates” IC design style is being replaced by “sea-of-hard-macros” design style − Reuse predesigned IP modules / macros − Reduce the design cost, deal with increasing complexity − Previously performed in floorplanning at designers’ discretion ■ The boundary between placement and ﬂoorplanning is increasingly blurred Courtesy of EE Times

ComPLx: Macro Placement by Macro Shredding (1) ■ Observation − Feasibility projection largely preserves the relative placement − The array of cells are transformed into similar shapes ■ Revised “macro shredding” : a one-stage approach for simultaneous standard-cell and macro placement − Macro cells are divided into equal-sized cells (shreds) only for the feasibility projection − P C on macros is calculated by averaging P C locations of shreds − Linear systems remain unchanged (limiting complexity increases)

ComPLx: Macro Placement by Macro Shredding (2) P C is applied to shreds P C on macros is found by averaging P C locations of shreds Minimization of Lagrangian

ComPLx: Experiments on ISPD 2005 benchmarks ■ 10% faster than FastPlace, 2.8X and 7.2X faster than NTUPlace3 and mPL6, >2.3X Faster than RQL

ComPLx: Experiments on ISPD 2006 benchmarks ■ Scaled HPWL = HPWL * ( 1+ density_overflow_penalty) ■ Demonstrates fast convergence and strong spreading quality

Timing-driven Placement ■ Extensions for timing- and power-driven placement traditionally rely on net weights − Weigh the nets with high activity factors / timing criticality

Extending ComPLx to Routability-driven Placement ComPLx SimPLR

no yes ComPLx: Baseline Wirelength-driven Placement Initial HPWL Optimization Legalization Detailed Placement consider: Legality, Target Utilization Global Placement Iterations Unconstrained optimization

no yes ComPLx: Routability-driven Placement Initial HPWL Optimization Legalization Congestion-aware Detailed Placement consider: Legality, Target Utilization, Routability Global Placement Iterations Enables early routability prediction  Placer can respond early and often

SimPLR Illustration (1)

Congestion-aware Detailed Placement: Illustration After Global Placement Congestion-unaware DP Congestion-aware DP

Congestion Map Improvement due to SimPLR Best in ISPD 2011 Contest SimPLR

SAPT Illustration ̶ Alignment Constraints Manual Placement Automated Placement Skewed Netweighting +Anchor Alignment

Conclusions SimPL ICCAD`10, TCAD`12 SimPL ICCAD`10, TCAD`12 ComPLx DAC`12 ComPLx DAC`12 A common mathematical foundation SimPLR ICCAD`11 SimPLR ICCAD`11 Ripple ICCAD`11 Ripple ICCAD`11 MAPLE ISPD`12 MAPLE ISPD`12 Lopper ISPD`11 Lopper ISPD`11 PADÉ DAC`12 PADÉ DAC`12 SAPT ISPD`12 SAPT ISPD`12 NCTU@ ASPDAC`12 NCTU@ ASPDAC`12 Routability Clock-tree codesign and power optim’n Multilevel optimizat’n Datapath awareness Thermal awareness As of Aug 2012, MAPLE is used at IBM as the default option for all ASIC and CPU designs

Thank you!

Relevant Publications M.-C. Kim, D.-J. Lee and I. L. Markov, “SimPL: An Effective Placement Algorithm,” ICCAD 2010, pp. 649-656. M.-C. Kim*, J. Hu*, D.-J. Lee, and I. L. Markov, “A SimPLR method for Routability-driven Placement,” ICCAD 2011, pp. 67-73. M.-C. Kim, D.-J. Lee and I. L. Markov, “SimPL: An Effective Placement Algorithm,” IEEE TCAD 31(1): pp. 50-60, 2012. M.-C. Kim, N. Viswanathan, C. J. Alpert, I. L. Markov and Shyam Ramji, “MAPLE: Multilevel Adaptive PLacEment for Mixed-Size Designs,” ISPD 2012, pp. 193-200. M.-C. Kim and I. L. Markov “ComPLx: A Competitive Primal-dual Lagrange Optimization for Global Placement,” Design Automation Conference (DAC) 2012. S. Ward, M. –C. Kim, N. Viswanathan, Z. Li, C. J. Alpert, E. Swartzlander, D. Z. Pan, “Keep it Straight: Teaching Placement how to Better Handle Designs with Datapaths,” ISPD 2012, pp. 79-86.

SimPLR Empirical Results vs. SimPL and ISPD`11 Contest ■ Overflow is reported by running a full-fledged global router ■ Versus HPWL-driven placement: − Average of 3.81x better overflow (7 of 8 best) at the cost of 4% routed wirelength ■ Versus other routability-driven placers in the ISPD`11 Contest: − Average of 2.04x better overflow (8 of 8 best) with 1% better routed wirelength

SimPLR Emprical Results : Ca-DP ■ Versus HPWL-driven detailed placement: − Average of 18% better overflow (7 of 8 best) at the cost of 1% in routed wirelength

Hybrids SAPT Experimental Results: Hybrids ■ Hybrid designs integrates datapaths into larger netlist − A mixture of datapaths and random logic standard cells − We used industrial hybrid designs in IBM ■ We report a 5.8% improvement in total StWL compared to SimPL

Multi-objective Placement Optimization for High-performance Nanoscale Integrated Circuits Igor L. Markov August 20, 2012.

Similar presentations

Presentation on theme: "Multi-objective Placement Optimization for High-performance Nanoscale Integrated Circuits Igor L. Markov August 20, 2012."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multi-objective Placement Optimization for High-performance Nanoscale Integrated Circuits Igor L. Markov August 20, 2012.

Similar presentations

Presentation on theme: "Multi-objective Placement Optimization for High-performance Nanoscale Integrated Circuits Igor L. Markov August 20, 2012."— Presentation transcript:

Similar presentations

About project

Feedback