Download presentation
Presentation is loading. Please wait.
Published byBlaise Brown Modified over 9 years ago
1
Constraint-Driven Large Scale Circuit Placement Algorithms Advisor: Prof. Jason Cong Student: Min Xie September, 2006
2
UCLA VLSICAD LAB Outline u Chapter 1. Introduction u Chapter 2. Optimality and scalability study of existing placement algorithms u Chapter 3. Routability driven multilevel global placement and white space allocation u Chapter 4. A robust legalization scheme for mixed-size placement u Chapter 5. Applications of mixed-size placement legalization u Chapter 6. “Global” localized preprocessing for detailed placement u Chapter 7. Heterogeneous placement for FPGAs u Chapter 8. Conclusions and future works
3
UCLA VLSICAD LAB Publication List u Cong. J, Xie M., and Zhang Y. “An Enhanced Multilevel Routing System,” Proceedings of the ICCAD, pp. 51-58, 2002. u Chang C., Cong J. and Xie M., “Optimality and Scalability of Existing Placement Algorithms,” Proceedings of ASPDAC, pp. 621-627, 2003. u Cong J., Romesis M. and Xie M., “Optimality, Scalability and Stability Study of Existing Partitioning and Placement Algorithms,” Proceedings of ISPD, pp. 88-94, 2003. u Cong J., Romesis M. and Xie M., “Optimality and Stability Study of Timing-driven Placement Algorithms,” Proceedings of ICCAD, pp. 472-478, 2003. u Cong J., Kong T., Shinnerl J. Xie M. and Yuan X. “Large-Scale Circuit Placement: Gap and Promise,” Proceedings of ICCAD, pp. 883-890, 2003. u Chang C., Cong J. Romesis M. and Xie M., “Optimality and Scalability of Existing Placement Algorithms,” IEEE TCAD, vol. 23, no. 4, pp. 537-549, 2004.
4
UCLA VLSICAD LAB Publication List u Li C., Xie M, Koh C.K., Cong J., and Madden P., “Routability-driven Placement and White Space Allocation,” Proceedings of ICCAD, pp. 883-890, 2004. u J. Cong, J. Fang, M. Xie, and Y. Zhang, IEEE TCAD, Vol. 24, No. 3, pp. 382-394, March 2005. u J. Cong, J. Fang, M. Xie, and Y. Zhang, "MARS - A Multilevel Full-Chip Gridless Routing System," IEEE TCAD, Vol. 24, No. 3, pp. 382-394, March 2005. u J. Cong, T. Kong, J. Shinnerl, M. Xie, and X. Yuan, "Large Scale Circuit Placement," ACM TODAES, Vol. 10, No. 2, pp. 389-430, April 2005. u Li C., Xie M, Koh C.K., Cong J., and Madden P., “Routability-driven Placement and White Space Allocation,” IEEE TCAD, to appear. u T. Chan, J. Cong M. Romesis J. Shinnerl, K. Sze, M. Xie, “mPL6: A Robust Multilevel Mixed-size Placement Engine,” Proceedings of ISPD, pp. 227-229, April 2005. u Cong J. and Xie M., “A Robust Detailed Placement Algorithm for Mixe-size IC Designs”, Proceedings of ASPDAC, pp.188-194., 2006. u J. Cong, T. Chan, J. Shinnerl, K. Sze and M. Xie, "mPL6: Enhanced Multilevel Mixed-size Placement," Proceedings of the ISPD, pp. 212-214, April 2006.,
5
UCLA VLSICAD LAB Relative Wirelength mPL 1.0 [ICCAD00] Recursive ESC clustering NLP at coarsest level Goto discrete relaxation Slot Assignment legalization Domino detailed placement year 20002001 20022003 2004 mPL 1.1 FC-Clustering added partitioning to legalization mPL 2.0 RDFL relaxation primal-dual netlist pruning mPL 3.0 [ICCAD 03] QRS relaxation AMG interpolation multiple V-cycles cell-area fragmentation UNIFORM CELL SIZE NON-UNIFORM CELL SIZE mPL 4.0 improved DP better coarsening backtracking V-cycle mPL5,mPL6 Multilevel Force-Directed A Brief History of mPL
6
UCLA VLSICAD LAB Multiscale Optimization Framework Interpolation & Relaxation (optimization) Coarsening (Clustering) Problem size decreases Explores different scales of the solution space at different levels Supports VERY FAST and SCALABLE methods Supports inclusion of complicated objectives and constraints Successful across MANY DIVERSE applications Given problem
7
UCLA VLSICAD LAB mPL6 – Generalized Force Directed Refinement u Logsum wirelength u Average bin density u Equality constraint Average bin density = utilization ratio 1 1 3 2 43 2 v6v6 v5v5 v4v4 v3v3 v2v2 v1v1 v7v7 = a 13 (v 7 ) = fractional area of cell v 7 in bin B 13
8
UCLA VLSICAD LAB mPL6 – Iterative Flow Level 3 Level 2 Level 1 C C I I C+I I I u Bestchoice clustering [Alpert et al, ISPD05] u AMG declustering [Chen et al, DAC03, Chan et al ICCAD03] u Multiple V cycle with distance based reclustering [Chan et al, ICCAD03]
9
UCLA VLSICAD LAB Outline u Chapter 1. Introduction u Chapter 2. Optimality and scalability study of existing placement algorithms u Chapter 3. Routability driven multilevel global placement and white space allocation Motivation and previous work Routability-driven multilevel placement Experiment results Conclusions and future work u Chapter 4. A robust legalization scheme for mixed-size placement u Chapter 5. Applications of mixed-size placement legalization u Chapter 6. “Global” localized preprocessing for detailed placement u Chapter 7. Heterogeneous placement for FPGAs u Chapter 8. Conclusions and future works
10
UCLA VLSICAD LAB Motivation u mPL does not consider routing congestion Aggressive HPWL minimization != routability u Routability-driven placement Routability modeling Routability optimization
11
UCLA VLSICAD LAB Previous Work -- Routability Modeling u Topology-free methods Dragon [Yang et al., TCAD03] Sparse [Hu et al., ICCAD02] BonnPlace [Brenner & Rohe, ISPD02] u Topology-based methods [Mayrhofer & Lauther, ICCAD90] mPG [Chang et al., ISPD02]
12
UCLA VLSICAD LAB Previous Work -- Routability Optimization u Cell weighting Cell inflation based on congestion Constructive and iterative methods Dragon [Yang et al, TCAD03] Dragon [Yang et al, TCAD03] BonnPlace [Brenner & Rohe, ISPD02] BonnPlace [Brenner & Rohe, ISPD02] u Net weighting Translate into bin weights and optimize weighted wirelength Iterative methods Sparse [Hu & Sadowska, ICCAD02] Sparse [Hu & Sadowska, ICCAD02] mPG [Chang et al, ISPD02] mPG [Chang et al, ISPD02]
13
UCLA VLSICAD LAB Routability-Driven Multilevel Placement u Global placement Congestion estimation by a fast LZ router Congestion-driven cell re-placement based on weighted wirelength u Hierarchical top-down white space allocation Geometric-based slicing tree Congestion estimation on tree Cutline adjustment
14
UCLA VLSICAD LAB mPL-R Congestion Estimation with LZ Router Use LZ-Router [Chang et al., ISPD02] for fast congestion analysis on each level Binary search on V-stem (or H- stem) Initialize left region and right region to cover bounding box Repeat Query wire usage on both regions Select region with less congestion Left region Right region HVH VHV Less congested More congested
15
UCLA VLSICAD LAB mPL-R Congestion-Driven Re-Placement Pick cells whose incident nets cross congested regions to move Start from the optimal location for HPWL Search adjacent bins within certain window 0.5 1.2 2.0 WLc = 15.5 Choose the bin based on weighted WL WLc = 9.2
16
UCLA VLSICAD LAB White Space Allocation -- Slicing Tree Construction root AB C D EF GH A B C D E F G H Recursively bipartition chip region from top to bottom. Estimate congestion on leaf nodes. Congestion on other nodes can be computed from bottom to top. Cut direction Cut location Node area Congestion Group cells into children nodes according to location relative to cutline.
17
UCLA VLSICAD LAB AB C D EF GH A B C D E F G H White Space Allocation – Cutline Adjustment Adjust cut location from top to bottom such that white spaces for children nodes are proportional to their overflow. root 240/88 116/28124/60 cell area/congestion Assuming chip area of root = 300 Total WS area = 300 – 240 = 60 WS area for left child = 60*28/(28+60) = 19.1 WS area for right child= 40.9 Chip area for left child = 116+19.1 = 135.1 Chip area for right child = 124+40.9 = 164.9 AB C D EF GH
18
UCLA VLSICAD LAB AB C D EF GH A B C D E F G H White Space Allocation – Cutline Adjustment Adjust cut location from top to bottom such that white spaces for children nodes are proportional to their congestions. root 240/88 116/28124/60 54/9 62/1958/34 66/26 cell area/congestion
19
UCLA VLSICAD LAB AB C D EF GH A B C D E F G H White Space Allocation – Cutline Adjustment Adjust cut location from top to bottom such that white spaces for children nodes are proportional to their congestions. root 240/88 116/28124/60 cell area/congestion
20
UCLA VLSICAD LAB Experiment Setup u 16 IBM version 2 examples 5% to 15% white space u Three state-of-the-art routability-driven placers Dragon-fd 3.01 [Yang et al, TCAD03] Simulated annealing with bin swapping Simulated annealing with bin swapping Two-step white space allocation Two-step white space allocation Capo 10.0 [Roy et al, ISPD06] Fast steiner tree approximation Fast steiner tree approximation Congestion based cutline shifting Congestion based cutline shifting Fengshui 5.1 [Agnihotri et al, ISPD05] Recursive bi-section Recursive bi-section Similar white space allocation method incorporated Similar white space allocation method incorporated u Magma router for evaluation
21
UCLA VLSICAD LAB Routability-Driven Placement Tools Comparison mPL-R+WSA is the only flow to produce all successful routing mPL-R+WSA produces the shortest wirelength
22
UCLA VLSICAD LAB Routability Optimization Techniques Comparison u mPL Latest pure WL-driven version No consideration of routing congestion u mPL-R u mPL-I Cell inflation + dummy density assignment Highest quality in ISPD06 contest [Nam ISPD06] Density target set as utilization u mPL+WSA u mPL-R+WSA
23
UCLA VLSICAD LAB Routability Optimization Techniques Comparison mPL-I with heuristic penalty term does not perform very well Both mPL-R and WSA improves routability significantly Combined workflow gives the highest completion rate
24
UCLA VLSICAD LAB Outline u Chapter 1. Introduction u Chapter 2. Optimality and scalability study of existing placement algorithms u Chapter 3. Routability driven multilevel global placement and white space allocation u Chapter 4. A robust legalization scheme for mixed-size placement u Chapter 5. Applications of mixed-size placement legalization Enhancement for macro legalization algorithm Additional experiment results u Chapter 6. “Global” localized preprocessing for detailed placement u Chapter 7. Heterogeneous placement for FPGAs u Chapter 8. Conclusions and future works
25
UCLA VLSICAD LAB Enhancement for Macro Legalization u Constraint graph reduction Original constraint graph One edge for each pair of macros One edge for each pair of macros O(n 2 ) in total O(n 2 ) in total Reduced constraint graph Edge inserted only when no transitive closure present Edge inserted only when no transitive closure present Significant reduction of memory consumption Significant reduction of memory consumption ? A B C
26
UCLA VLSICAD LAB Experiment Result with ICCAD04-MS u 84% reduction of constraint edges u No degradation of solution quality
27
UCLA VLSICAD LAB Enhancement for Macro Legalization f ij x H ij u Used in ISPD 2006 placement contest
28
UCLA VLSICAD LAB ISPD05 Examples u Bigger problem size u Suitable to test scalability
29
UCLA VLSICAD LAB Scalability Comparison on ISPD05 -- Global Placements by APlace u XDP produces 1% longer WL, but is 10X faster
30
UCLA VLSICAD LAB Scalability Comparison on ISPD05 -- Global Placements by mPL u XDP can be 10x faster with comparable quality
31
UCLA VLSICAD LAB Impact of Gradual Macro Legalization – ISPD05 u 12 % WL reduction possible with macros movable
32
UCLA VLSICAD LAB Outline u Chapter 1. Introduction u Chapter 2. Optimality and scalability study of existing placement algorithms u Chapter 3. Routability driven multilevel global placement and white space allocation u Chapter 4. A robust legalization scheme for mixed-size placement u Chapter 5. Applications of mixed-size placement legalization u Chapter 6. “Global” localized preprocessing for detailed placement u Chapter 7. Heterogeneous placement for FPGAs Motivation and previous works Multilevel heterogeneous placement – mPL-H Experiment results Conclusions and future work u Chapter 8. Conclusions and future works
33
UCLA VLSICAD LAB Motivation u Popularity of FPGAs Ease of use Low cost for small to medium production u Modern FPGA placement impose heterogeneous constraints Memory block of different capacity, DSP blocks Each block should only be placed on sites of the same type
34
UCLA VLSICAD LAB Example FPGA Chip Figure taken from Altera Stratix Handbook
35
UCLA VLSICAD LAB Previous Works -- Academia u Simulated annealing VPR [Betz & Rose, FPL97, Marquardt et al, FPGA00] PATH [Kong, ICCAD02] SPCD [Chen & Cong, FPL04, FPGA05] u Partitioning PPFF [Maidee et al, DAC03] u Graph embedding CAPRI [ CAPRI [Gopalakrishnan et al, DAC06 ] u Multilevel Ultrafast-VPR [Sankar & Rose, FPGA99] mPG-ms [Cong & Yuan, ASPDAC03] u None of them handle heterogeneous constraint
36
UCLA VLSICAD LAB Previous Works -- Industry u Quartus II by Altera Corporation Stratix, Stratix II, etc. u ISE by Xilinx Corporation Virtex II, Virtex II Pro, etc. u Do have heterogeneous capability Only for proprietary chip architecture Algorithms and techniques not publicly documented
37
UCLA VLSICAD LAB Multilevel Heterogeneous Placement – mPL-H u Based on multilevel generalized force directed placement u Multi-layered placement to handle heterogeneous placement u Filler cells to enhance quality and stability u Gradual carry chain legalization
38
UCLA VLSICAD LAB Limitations of mPL for Heterogeneous Placement u Does not consider heterogeneous constraints Any block can be placed anywhere u Requires density to be uniform everywhere Penalize wirelength for low utilization
39
UCLA VLSICAD LAB mPL-H -- Global Placement (I) u Multiple layers, each layer for each resource DSP layer M-RAM layer LAB layer M4K layer M512 layer u Forbidden regions blocked by obstacles u Uniform wirelength computation DSP M-RAM LAB
40
UCLA VLSICAD LAB mPL-H -- Global Placement (II) u Filler cell Occupy the residual capacity Transform inequality into equality Density computed independently on each layer Granularity may not be fine enough
41
UCLA VLSICAD LAB mPL-H -- Legalization (I) u DSP and memory blocks Domains do not overlap Legalized independently Legalized independently Uniform size for the same type Linear assignment O(n 3 ) Linear assignment O(n 3 ) Cost as distance Cost as distance cells sites
42
UCLA VLSICAD LAB mPL-H -- Legalization (II) u Carry chains Vary in length Legalized in descending order of length Partition each column into same size Assign chains of same length using linear assignment
43
UCLA VLSICAD LAB mPL-H -- Legalization (III) u Column-wise rearrangement of carry chains P(n,m) is the minimum perturbation of assign (v 1,…v) to sites (s 1,s 2,…s m ) P(n,m) is the minimum perturbation of assign (v 1,…v n ) to sites (s 1,s 2,…s m ) P(1,j) = d(1,j), d(1,j) is the perturbation of assigning v 1 to site s j P(i,j) = min{P(i-1,j-h i ), P(i, j-1)} Can be solved more efficiently for some special cases Quadratic distance Quadratic distance No site constraint No site constraint
44
UCLA VLSICAD LAB Experiment Setting Quartus_map Verilog netlist Quartus_fittermPL-H Clustered.vqm netlist Quartus_router Chip type Architecture Description XML.qsf placement
45
UCLA VLSICAD LAB QUIP Suite
46
UCLA VLSICAD LAB Wirelength Comparison mPL-H is 3% better in HPWL, and 2% better in routed WL than Quartus II v5.0
47
UCLA VLSICAD LAB Runtime Comparison mPL-H can be 2X faster than Quartus II v5.0 when the circuit becomes sufficiently large
48
UCLA VLSICAD LAB Optimality Study of mPL-H u PEKO-H construction Populate all sites with corresponding resource type Generate each net with optimal wirelength Extract the netlist in the end
49
UCLA VLSICAD LAB Experiment Results with PEKO-H mPL-H produces HPWL 34% longer than the optima
50
UCLA VLSICAD LAB Displacement of PEKO-H13
51
UCLA VLSICAD LAB Displacement of PEKO-H16 Swirls are difficult for local refinement to recover
52
UCLA VLSICAD LAB Conclusions u First analytical work for heterogeneous placement u Compared to leading edge Quartus II v5.0 for Stratix 3 % shorter HPWL, 2 % shorter routed WL Can be 2X faster when example becomes sufficiently large u Optimality study with PEKO-H Displacement observed from the optima 34% longer HPWL than the optima
53
UCLA VLSICAD LAB Future Work u Accurate timing analysis Only point-to-point delay table released OK for overlap-free intermediate results OK for overlap-free intermediate results Not accurate enough for analytical placer Not accurate enough for analytical placer Guide timing-driven placement u Routing congestion Proprietary routing resource information not publicly available
54
The End Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.