Architecture and Details of a High Quality, Large-Scale Analytical Placer Andrew B. Kahng, Sherief Reda and Qinke Wang VLSI CAD Lab University of California,

Slides:

Advertisements

Similar presentations

MIP-based Detailed Placer for Mixed-size Circuits Shuai Li, Cheng-Kok Koh ECE, Purdue University {li263,

Advertisements

Optimization of Placement Solutions for Routability Wen-Hao Liu, Cheng-Kok Koh, and Yih-Lang Li DAC’13.

Natarajan Viswanathan Min Pan Chris Chu Iowa State University International Symposium on Physical Design April 6, 2005 FastPlace: An Analytical Placer.

X-Architecture Placement Based on Effective Wire Models Tung-Chieh Chen, Yi-Lin Chuang, and Yao-Wen Chang Graduate Institute of Electronics Engineering.

MAPLE: Multilevel Adaptive PLacEment for Mixed-Size Designs Myung-Chul Kim †, Natarajan Viswanathan ‡, Charles J. Alpert ‡, Igor L. Markov †, Shyam Ramji.

Meng-Kai Hsu, Sheng Chou, Tzu-Hen Lin, and Yao-Wen Chang Electronics Engineering, National Taiwan University Routability Driven Analytical Placement for.

A Size Scaling Approach for Mixed-size Placement Kalliopi Tsota, Cheng-Kok Koh, Venkataramanan Balakrishnan School of Electrical and Computer Engineering.

Shuai Li and Cheng-Kok Koh School of Electrical and Computer Engineering, Purdue University West Lafayette, IN, Mixed Integer Programming Models.

Ripple: An Effective Routability-Driven Placer by Iterative Cell Movement Xu He, Tao Huang, Linfu Xiao, Haitong Tian, Guxin Cui and Evangeline F.Y. Young.

SimPL: An Effective Placement Algorithm Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of Michigan 1ICCAD 2010, Myung-Chul Kim,

Consistent Placement of Macro-Blocks Using Floorplanning and Standard-Cell Placement Saurabh Adya Igor Markov (University of Michigan)

FastPlace: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model FastPlace: Efficient Analytical Placement.

Placer Suboptimality Evaluation Using Zero-Change Transformations Andrew B. Kahng Sherief Reda VLSI CAD lab UCSD ECE and CSE Departments.

Intrinsic Shortest Path Length: A New, Accurate A Priori Wirelength Estimator Andrew B. KahngSherief Reda VLSI CAD Laboratory.

APLACE: A General and Extensible Large-Scale Placer Andrew B. KahngSherief Reda Qinke Wang VLSICAD lab University of CA, San Diego.

Constructive Benchmarking for Placement David A. Papa EECS Department University of Michigan Ann Arbor, MI Igor L. Markov EECS.

Power-Aware Placement

Implementation and Extensibility of an Analytic Placer Andrew B. Kahng and Qinke Wang UCSD CSE Department {abk, Work partially supported.

An Analytic Placer for Mixed-Size Placement and Timing-Driven Placement Andrew B. Kahng and Qinke Wang UCSD CSE Department {abk, Work.

Architectural-Level Prediction of Interconnect Wirelength and Fanout Kwangok Jeong, Andrew B. Kahng and Kambiz Samadi UCSD VLSI CAD Laboratory

Supply Voltage Degradation Aware Analytical Placement Andrew B. Kahng, Bao Liu and Qinke Wang UCSD CSE Department {abk, bliu,

Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.

A Semi-Persistent Clustering Technique for VLSI Circuit Placement Charles J. Alpert 1, Andrew Kahng 2, Gi-Joon Nam 1, Sherief Reda 2 and Paul G. Villarrubia.

Placement Feedback: A Concept and Method for Better Min-Cut Placements Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La.

On Legalization of Row-Based Placements Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La Jolla, CA 92093

1 A Tale of Two Nets: Studies in Wirelength Progression in Physical Design Andrew B. Kahng Sherief Reda CSE Department University of CA, San Diego.

Can Recursive Bisection Alone Produce Routable Placements? Andrew E. Caldwell Andrew B. Kahng Igor L. Markov Supported by Cadence.

Accurate Pseudo-Constructive Wirelength and Congestion Estimation Andrew B. Kahng, UCSD CSE and ECE Depts., La Jolla Xu Xu, UCSD CSE Dept., La Jolla Supported.

Triple Patterning Aware Detailed Placement With Constrained Pattern Assignment Haitong Tian, Yuelin Du, Hongbo Zhang, Zigang Xiao, Martin D.F. Wong.

A Resource-level Parallel Approach for Global-routing-based Routing Congestion Estimation and a Method to Quantify Estimation Accuracy Wen-Hao Liu, Zhen-Yu.

POLAR 2.0: An Effective Routability-Driven Placer Chris Chu Tao Lin.

A Parallel Integer Programming Approach to Global Routing Tai-Hsuan Wu, Azadeh Davoodi Department of Electrical and Computer Engineering Jeffrey Linderoth.

Mixed-Size Placement with Fixed Macrocells using Grid-Warping Zhong Xiu*, Rob Rutenbar * Advanced Micro Devices Inc., Department of Electrical and Computer.

CRISP: Congestion Reduction by Iterated Spreading during Placement Jarrod A. Roy†‡, Natarajan Viswanathan‡, Gi-Joon Nam‡, Charles J. Alpert‡ and Igor L.

TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.

Solving Hard Instances of FPGA Routing with a Congestion-Optimal Restrained-Norm Path Search Space Keith So School of Computer Science and Engineering.

March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,

Archer: A History-Driven Global Routing Algorithm Mustafa Ozdal Intel Corporation Martin D. F. Wong Univ. of Illinois at Urbana-Champaign Mustafa Ozdal.

UC San Diego / VLSI CAD Laboratory Incremental Multiple-Scan Chain Ordering for ECO Flip-Flop Insertion Andrew B. Kahng, Ilgweon Kang and Siddhartha Nath.

Seeing the Forest and the Trees: Steiner Wirelength Optimization in Placement Jarrod A. Roy, James F. Lu and Igor L. Markov University of Michigan Ann.

An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.

1 Wire Length Prediction-based Technology Mapping and Fanout Optimization Qinghua Liu Malgorzata Marek-Sadowska VLSI Design Automation Lab UC-Santa Barbara.

Analytic Placement. Layout Project:  Sending the RTL file: −Thursday, 27 Farvardin  Final deadline: −Tuesday, 22 Ordibehesht  New Project: −Soon 2.

Quadratic and Linear WL Placement Using Quadratic Programming: Gordian & Gordian-L Shantanu Dutt ECE Dept., Univ. of Illinois at Chicago Acknowledgements:

-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI.

Multilevel Generalized Force-directed Method for Circuit Placement Tony Chan 1, Jason Cong 2, Kenton Sze 1 1 UCLA Mathematics Department 2 UCLA Computer.

Jason Cong‡†, Guojie Luo*†, Kalliopi Tsota‡, and Bingjun Xiao‡ ‡Computer Science Department, University of California, Los Angeles, USA *School of Electrical.

Session 10: The ISPD2005 Placement Contest. 2 Outline  Benchmark & Contest Introduction  Individual placement presentation  FastPlace, Capo, mPL, FengShui,

Large Scale Circuit Placement: Gap and Promise Jason Cong UCLA VLSI CAD LAB 1 Joint work with Chin-Chih Chang, Tim Kong, Michail Romesis, Joseph R. Shinnerl,

Register Placement for High- Performance Circuits M. Chiang, T. Okamoto and T. Yoshimura Waseda University, Japan DATE 2009.

Quadratic VLSI Placement Manolis Pantelias. General Various types of VLSI placement  Simulated-Annealing  Quadratic or Force-Directed  Min-Cut  Nonlinear.

Physical Synthesis Comes of Age Chuck Alpert, IBM Corp. Chris Chu, Iowa State University Paul Villarrubia, IBM Corp.

Optimality, Scalability and Stability study of Partitioning and Placement Algorithms Jason Cong, Michail Romesis, Min Xie UCLA Computer Science Department.

Routability-driven Floorplanning With Buffer Planning Chiu Wing Sham Evangeline F. Y. Young Department of Computer Science & Engineering The Chinese University.

1 NTUplace: A Partitioning Based Placement Algorithm for Large-Scale Designs Tung-Chieh Chen 1, Tien-Chang Hsu 1, Zhe-Wei Jiang 1, and Yao-Wen Chang 1,2.

Unified Quadratic Programming Approach for Mixed Mode Placement Bo Yao, Hongyu Chen, Chung-Kuan Cheng, Nan-Chi Chou*, Lung-Tien Liu*, Peter Suaris* CSE.

System in Package and Chip-Package-Board Co-Design

Outline Motivation and Contributions Related Works ILP Formulation

International Symposium on Physical Design San Diego, CA April 2002ER UCLA UCLA 1 Routability Driven White Space Allocation for Fixed-Die Standard-Cell.

May Mike Drob Grant Furgiuele Ben Winters Advisor: Dr. Chris Chu Client: IBM IBM Contact – Karl Erickson.

Effective Linear Programming-Based Placement Techniques Sherief Reda UC San Diego Amit Chowdhary Intel Corporation.

Partial Reconfigurable Designs

HeAP: Heterogeneous Analytical Placement for FPGAs

Revisiting and Bounding the Benefit From 3D Integration

APLACE: A General and Extensible Large-Scale Placer

2 University of California, Los Angeles

mPL 5 Overview ISPD 2005 Placement Contest Entry

A Semi-Persistent Clustering Technique for VLSI Circuit Placement

EDA Lab., Tsinghua University

Multi-Commodity Flow-Based Spreading in a Commercial Analytic Placer

Presentation transcript:

Architecture and Details of a High Quality, Large-Scale Analytical Placer Andrew B. Kahng, Sherief Reda and Qinke Wang VLSI CAD Lab University of California, San Diego Work partially supported by the MARCO Gigascale Systems Research Center. ABK is currently with Blaze DFM, Inc., Sunnyvale, CA.

2 Outline History of APlace From APlace1.0 to APlace2.0 Anatomy of APlace2.0 New techniques in APlace2.0 Experimental Results Conclusions and Future Work

3 History of APlace Research to study Synopsys patent –Naylor et al., US Patent 6,301,693 (2001) Extensible foundation: APlace1.0 –Timing-driven placement –Mixed-size placement –Area-I/O placement ISPD-2005 placement contest  APlace2.0 –Many parts of APlace rewritten –Superior performance

4 Outline History of APlace From APlace1.0 to APlace2.0 Anatomy of APlace2.0 New techniques in APlace2.0 Experimental Results Conclusions and Future Works

5 APlace Problem Formulation Constrained Nonlinear Optimization: Divide the layout area into uniform bins, and seek to minimize HPWL etc. so that total cell area in every bin is equalized – : density function that equals the total cell area in a global bin g –D : average cell area over all global bins

6 Nonlinear Optimization Smooth approximation of placement objectives: wirelength, density function, etc. Quadratic Penalty method –Solve a sequence of unconstrained minimization problems for a sequence of µ → 0 Conjugate Gradient (CG) solver –Useful for finding an unconstrained minimum of a high-dimensional function –Adaptable to large-scale placement problems: memory requirement is linear in problem size

7 Wirelength Approximation Half-Perimeter Wirelength (HPWL) –Half-perimeter of net’s bounding box –Simple, close measure of routing congestion –Not strictly convex, or everywhere differentiable Log-Sum-Exp approximation –Naylor et al., US Patent 6,301,693 (2001) –Precise, closer to HPWL when α → 0 –Strictly convex, continuously differentiable

8  : Smoothing Parameter “Significance criterion” for choosing nets with large wirelength to minimize –Larger gradients for longer nets –Minimize long nets more efficiently than short nets Two-pin net Partial gradient for x 1 –close to 0, when net length |x 1 - x 2 | is small compared to  –close to 1 or -1, o.w.

9 Area Potential Function Overlap area = –overlap along the x and y directions –0/1 function with cell size ignored Area potential function: defines an “area potential” exerted by a cell to nearby grids –smooth bell-shaped function for standard cells [Naylor et al., US Patent 6,301,693 (2001)]

10 Module Area Potential Function Mixed-size placement: decide scope of area potential based on module's dimension p(d) : potential function –d : distance from module to grid –radius r = w/2 + 2w g for block with width w 1-a*d 2 b*(r-d) 2 d p(d) -w/2-2 w g w/2+ w g –convex curve d < w/2 + w g –concave curve w/2 + w g < d < w/2+ 2w g –smooth at d = w/2 + w g

11 Changes: APlace1.0  APlace2.0 Strong scalability from new clustering algorithm Dynamic adjustment of weights for wirelength and overlap penalty during global placement Improvements to legalization, detailed placement –whitespace compaction –cell reordering algorithms –global greedy cell movement APlace2.0 vs. APlace1.0: up to 19% WL reduction 1.5-2x speedup

12 IBM BigBlue4 Placement 2.1M instances, HPWL = , CPU = 23h

13 Outline History of APlace From APlace1.0 to APlace2.0 Anatomy of APlace2.0 New techniques in APlace2.0 Experimental Results Conclusions and Future Works

14 Anatomy of APlace 2.0 Clustering Adaptive APlace engine WS arrangement Cell order polishing Unclustering Global moving Legalization Global Phase Detailed Phase

15 New Feature 1: Multi-Level Clustering Objective: cluster to reduce runtime and allow scalable implementations with no compromise to quality  Multi-level approach using best- choice clustering (ISPD’05)  Clustering ratio  10  #Top-level clusters  2000  Wirelength calculation –assume modules located at cluster center –only consider inter-cluster parts of nets netlist reduce netlist size by 10x size ~ 2000? global placement uncluster flat? Legalization yes no yes no

16 Best-Choice Clustering Each clustering level uses the best-choice heuristic with lazy updates and tight area control  For each clustering level:  Calculate the clustering score of each node to its neighbors based on the number of connections and areas  Sort all nodes based on their best scores using a heap  Until target clustering ratio is reached:  If top node of heap is “valid” then cluster it with its closest neighbor  Else recalculate the top node score and reinsert in heap; Continue  calculate the clustering score of the new node and reinsert into the heap  update netlist and mark all neighbors of the new node as invalid

17 Two Clustering Concerns Mark boundaries of clustering hierarchy at each clustering level  allow exact reversal of clustering during unclustering Meet target number of objects by avoiding “saturation”  bypass small fixed objects during clustering cluster fixed object bypass fixed objects

18 Multiple Levels of Grids Adaptive grid size based on average cluster size Better global optimization –use solution of placement problem constrained with coarser grids as initial solution for problem constrained with finer grids Better scalability –larger grid size spreads modules faster Different levels of relaxation for density constraints –According to grid size

19 New Feature 2: Adaptive WL Weight Important to QOR Initial weight value –For each cluster level and grid level –Based on wirelength and density partial derivatives –Goal: Magnitudes of gradients roughly equal Decrease WL weight by half whenever CG solver obtains a stable solution

20 New Feature 3: Legalization and Detailed Placement Variant of greedy legalization algorithm (Hill’01): 1.Sort all cells from left to right: move each cell in order to the closest legal position 2.Sort all cells from right to left: move each cell in order to the closest legal position(s) 3.Pick the better of (1) and (2) Detailed Placement Components: Global cell movement (Goto81, KenningsM98 BoxPlace, FP…) Whitespace compaction (KahngTZ’99, KahngMR’04) Cell order polishing (similar to rowIroning, FS detailed placer) Intra-row cell reordering Inter-row cell reordering

21 Global Moving Move cell to “optimal” location among available whitespace –improve quality when utilization is low Two steps –search for available location in optimal region of a cell’s placement –search for available location in “best” bin divide placement area into uniform bins choose “best" bin according to available whitespace and cost of moving cell to bin center assume normal distribution of whitespace with width and estimate if an available location exists

22 WhiteSpace (WS) Compaction  Each chain represents the possible placement sites for each cell  The cost on the arrows is the change in HWPL of the cell move to each site  The order of chains correspond to the order of cells from left to right in a row  A Shortest path from source to sink gives the best way to compact WS sites cell 1 cell 2 cell 3 cell n row start node end node

23 Cell Order Polishing Permute a small window of neighboring cells in order to improve wirelength –MetaPlacer’s rowIroning: up to 15 cells in one row assuming equal whitespace distribution –FengShui's cell ordering: six objects in one or more rows regarding whitespace as pseudo cells Branch-and-bound algorithm –four nearby cells in one or multiple rows –consider optimal placement for each permutation –more accurate, overlap-free permutations and no cell shifting

24 Single-Row Cell Ordering Cost of placing first j cells of a permutation –cost = wirelength increase when placing a cell –ΔWL≠ 0, only if cell is leftmost of rightmost –remaining cells placed to the right of first j cells –unrelated to order or placement of remaining cells B&B algorithm –construct permutations in lexicographic order next permutation has same prefix as the previous one beginning rows of DP table can be reused as possible –cut branch when minimum cost of placing first j cells > best cost till now

25 Two- or Three-Row Cell Ordering DP algorithm –decide how many cells assigned to each row from up to down –construct a permutation in lexicographic order –find “optimal” placement within the window Y-cost of placing first j cells: accurate –remaining cells placed lower than first j cells X-cost of placing first j cells: inaccurate when a net connects placed and unplaced cells –results show still effective with small set of cells and small window

26 Outline Introduction Clustering Global Placement Detailed Placement Experimental Results –IBM ISPD04 –IBM-PLACE v2 –IBM ICCAD04 –IBM ISPD05 Conclusions and Future Works

27 IBM ISPD04 3% better than the best other - mPL5 (ISPD05) Test basic placer performance with standard cells APlace2.0mPL5Capo9.0Dragon3FP1FS2.6 ibm ibm ibm ibm ibm ibm ibm ibm ibm Average

28 IBM Place V2 Test placer under whitespace presence and routability CircuitAPlace2.0ViasmPL+WSA ibm09-easy ibm09-hard ibm10-easy ibm10-hard ibm11-easy ibm11-hard ibm12-easy ibm12-hard Average % better than mPL-R+WSA (ICCAD04)

29 IBM ICCAD04 Test placer performance with cells and blocks (floorplacement) APlace2.0FS2.6Capo9.0 ibm ibm ibm ibm ibm ibm ibm ibm ibm Average % and 19% better than FS and Capo, respectively

30 IBM ISPD05 adaptec2adaptec4BB1BB2BB3BB4 AVG APlace mFAR Dragon mPL FastPlace Capo NTUP FengShui KW Test placer performance with cells and movable/fixed blocks 6% better than the best other placer (mFAR)

31 APlace2.0 Conclusions 60 days + clean sheet of paper + Qinke Wang + Sherief Reda Scalable implementation State-of-the-art clustering and global placement engines Improved detailed placement engine Better than best published results by 3% ISPD’04 suite 14% ICCAD’04 12% IBMPLACE V.2 6% ISPD’05 Placement Contest Recent Applications (other than restoring functionality) IR-drop driven placement (ICCD-2005 Best Paper) Lens aberration-aware placement (DATE-2006) Toward APlace3.0: ?

32 Thank You Questions?

33 Goals and Plan Goals: Build a new placer to win the competition Scalable, robust, high-quality implementation Leave no stone unturned / QOR on the table Plan and Schedule: Work within most promising framework: APlace 30 days for coding + 30 days for tuning

34 Philosophy Respect the competition Well-funded groups with decades of experience –ABKGroup’s Capo, MLPart, APlace = all unfunded side projects –No placement-related industry interactions QOR target: 24-26% better than Capo v9r6 on all known benchmarks –Nearly pulled out 10 days before competition Work smart Solve scalability and speed basics first –Slimmed-down data structure, -msse compiler options, etc. Ordered list of ~15 QOR ideas to implement Daily regressions on all known benchmarks Synthetic testcases to predict bb3, bb4, etc.

35 Implementation Framework APlace weaknesses: Weak clustering Poor legalization / detailed placement Clustering Adaptive APlace engine WS arrangement Cell order polishing Unclustering Global moving Legalization Global Phase Detailed Phase New APlace Flow New APlace: 1.New clustering 2.Adaptive parameter setting for scalability 3.New legalization + iterative detailed placement

36 Parameterization and Parallelizing Tuning Knobs:  Clustering ratio, # top-level clusters, cluster area constraints  Initial wirelength weight, wirelength weight reduction ratio  Max # CG iterations for each wirelength weight  Target placement discrepancy  Detailed placement parameters, etc. Resources:  SDSC ROCKS Cluster: 8 Xeon CPUs at 2.8GHz  Michigan Prof. Sylvester’s Group: 8 various CPUs  UCSD FWGrid: 60 Opteron CPUs at 1.6GHz  UCSD VLSICAD Group: 8 Xeon CPUs at 2.4GHz Wirelength Improvement after Tuning : 2-3%

37 Artificial Benchmark Synthesis  Synthetic benchmarks to test code scalability and performance  Rapid response to broadcast of s00-nam.pdf  Created “synthetic versions of bigblue3 and bigblue4 within 48 hours  Mimicked fixed-block layout diagrams in the artificial benchmark creation  This process was useful: we identified (and solved) a problem with clustering in presence of many small fixed blocks

38 Results Circuit GP HPWL Leg HPWL DP HPWLCPU (h) adaptec adaptec adaptec adaptec bigblue bigblue bigblue bigblue

39 Conclusions  ISPD05 = an exercise in process and philosophy  At end, we were still 4% short of where we wanted  Not happy with how we handled 5-day time frame  Auto-tuning  first results ~ best results  During competition, wrote but then left out “annealing” DP improvements that gained another 0.5%  Students and IBM ARL did a really, really great job  Currently restoring capabilities (congestion, timing-driven, etc.) and cleaning (antecedents in Naylor patent)