VLSI Physical Design Automation

Slides:



Advertisements
Similar presentations
Meng-Kai Hsu, Sheng Chou, Tzu-Hen Lin, and Yao-Wen Chang Electronics Engineering, National Taiwan University Routability Driven Analytical Placement for.
Advertisements

A Size Scaling Approach for Mixed-size Placement Kalliopi Tsota, Cheng-Kok Koh, Venkataramanan Balakrishnan School of Electrical and Computer Engineering.
Ripple: An Effective Routability-Driven Placer by Iterative Cell Movement Xu He, Tao Huang, Linfu Xiao, Haitong Tian, Guxin Cui and Evangeline F.Y. Young.
SimPL: An Effective Placement Algorithm Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of Michigan 1ICCAD 2010, Myung-Chul Kim,
1 Physical Hierarchy Generation with Routing Congestion Control Chin-Chih Chang *, Jason Cong *, Zhigang (David) Pan +, and Xin Yuan * * UCLA Computer.
Consistent Placement of Macro-Blocks Using Floorplanning and Standard-Cell Placement Saurabh Adya Igor Markov (University of Michigan)
FastPlace: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model FastPlace: Efficient Analytical Placement.
Early Days of Circuit Placement Martin D. F. Wong Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
A Clustering Utility Based Approach for S. Areibi, M. Thompson, A. Vannelli uoguelph.ca September 2001 School of Engineering ASIC Design 14th.
Congestion Driven Placement for VLSI Standard Cell Design Shawki Areibi and Zhen Yang School of Engineering, University of Guelph, Ontario, Canada December.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: March 4, 2008 Placement (Intro, Constructive)
Constructive Benchmarking for Placement David A. Papa EECS Department University of Michigan Ann Arbor, MI Igor L. Markov EECS.
Placement 1 Outline Goal What is Placement? Why Placement?
Fall 2006EE VLSI Design Automation I V-1 EE 5301 – VLSI Design Automation I Kia Bazargan University of Minnesota Part V: Placement.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 21: April 15, 2009 Routing 1.
Reconfigurable Computing (EN2911X, Fall07)
On Legalization of Row-Based Placements Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La Jolla, CA 92093
ICS 252 Introduction to Computer Design Lecture 15 Winter 2004 Eli Bozorgzadeh Computer Science Department-UCI.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 19: April 9, 2008 Routing 1.
Can Recursive Bisection Alone Produce Routable Placements? Andrew E. Caldwell Andrew B. Kahng Igor L. Markov Supported by Cadence.
EDA (CS286.5b) Day 7 Placement (Simulated Annealing) Assignment #1 due Friday.
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
Fall 2003EE VLSI Design Automation I 149 EE 5301 – VLSI Design Automation I Kia Bazargan University of Minnesota Part V: Placement.
CSE 144 Project Part 2. Overview Multiple rows Routing channel between rows Components of identical height but various width Goal: Implement a placement.
7/15/ VLSI Placement Prof. Shiyan Hu Office: EERC 731.
Placement-Centered Research Directions and New Problems Xiaojian Yang Amir Farrahi Synplicity Inc.
Chip Planning 1. Introduction Chip Planning:  Deals with large modules with −known areas −fixed/changeable shapes −(possibly fixed locations for some.
1 ENTITY test is port a: in bit; end ENTITY test; DRC LVS ERC Circuit Design Functional Design and Logic Design Physical Design Physical Verification and.
ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement.
CSE 242A Integrated Circuit Layout Automation Lecture 5: Placement Winter 2009 Chung-Kuan Cheng.
9/4/ VLSI Physical Design Automation Prof. David Pan Office: ACES Detailed Routing (I)
Placement by Simulated Annealing. Simulated Annealing  Simulates annealing process for placement  Initial placement −Random positions  Perturb by block.
CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.
Power Reduction for FPGA using Multiple Vdd/Vth
Global Routing.
Lecture 12 Review and Sample Exam Questions Professor Lei He EE 201A, Spring 2004
CAD for Physical Design of VLSI Circuits
10/7/ VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 6. Floorplanning (1)
TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.
10/11/ VLSI Physical Design Automation Prof. David Pan Office: ACES Placement (2)
March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,
Seeing the Forest and the Trees: Steiner Wirelength Optimization in Placement Jarrod A. Roy, James F. Lu and Igor L. Markov University of Michigan Ann.
Gordian Placement Tool: Quadratic and Linear Problem Formulation Ryan Speelman Jason Gordon Steven Butt EE 201A
Analytic Placement. Layout Project:  Sending the RTL file: −Thursday, 27 Farvardin  Final deadline: −Tuesday, 22 Ordibehesht  New Project: −Soon 2.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
10/25/ VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 3. Circuit Partitioning.
Placement. Physical Design Cycle Partitioning Placement/ Floorplanning Placement/ Floorplanning Routing Break the circuit up into smaller segments Place.
Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 10: February 6, 2002 Placement (Simulated Annealing…)
An Effective Congestion Driven Placement Framework André Rohe University of Bonn, Germany joint work with Ulrich Brenner.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 13: February 20, 2002 Routing 1.
Simulated Annealing in 512 bytes EMICRO2004 Microelectronics School Marcelo Johann B R A Z I L.
Optimality, Scalability and Stability study of Partitioning and Placement Algorithms Jason Cong, Michail Romesis, Min Xie UCLA Computer Science Department.
1 NTUplace: A Partitioning Based Placement Algorithm for Large-Scale Designs Tung-Chieh Chen 1, Tien-Chang Hsu 1, Zhe-Wei Jiang 1, and Yao-Wen Chang 1,2.
International Symposium on Physical Design San Diego, CA April 2002ER UCLA UCLA 1 Routability Driven White Space Allocation for Fixed-Die Standard-Cell.
Effective Linear Programming-Based Placement Techniques Sherief Reda UC San Diego Amit Chowdhary Intel Corporation.
A Snap-On Placement Tool Israel Waldman. Introduction.
6/19/ VLSI Physical Design Automation Prof. David Pan Office: ACES Placement (3)
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 21: November 28, 2005 Routing 1.
RTL Design Flow RTL Synthesis HDL netlist logic optimization netlist Library/ module generators physical design layout manual design a b s q 0 1 d clk.
Placement and Routing Algorithms. 2 FPGA Placement & Routing.
Partial Reconfigurable Designs
VLSI Physical Design Automation
HeAP: Heterogeneous Analytical Placement for FPGAs
VLSI Quadratic Placement
CS137: Electronic Design Automation
EE5780 Advanced VLSI Computer-Aided Design
EDA Lab., Tsinghua University
VLSI Physical Design Automation
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

VLSI Physical Design Automation Placement (1) Prof. David Pan dpan@ece.utexas.edu Office: ACES 5.434

Problem formulation Input: Output: Blocks (standard cells and macros) B1, ... , Bn Shapes and Pin Positions for each block Bi Nets N1, ... , Nm Output: Coordinates (xi , yi ) for block Bi. No overlaps between blocks The total wire length is minimized The area of the resulting block is minimized or given a fixed die Other consideration: timing, routability, clock, buffering and interaction with physical synthesis

Different Wire Length

Different Routability/Chip Area

Placement can Make a Difference MCNC Benchmark circuit e64 (contains 230 4-LUT). Placed to a FPGA. Random Initial Placement Final Placement After Detailed Routing

Importance of Placement Placement is a fundamental problem for physical design Glue of the physical synthesis Becomes very active again in recent years: Many new academic placers for WL min since 2000 Many other publications to handle timing, routability, etc. Reasons: Serious interconnect issues (delay, routability, noise) in deep-submicron design Placement determines interconnect to the first order Need placement information even in early design stages (e.g., logic synthesis) Placement problem becomes significantly larger Cong et al. [ASPDAC-03, ISPD-03, ICCAD-03] point out that existing placers are far from optimal, not scalable, and not stable

Design Types ASICs SoCs Micro-Processor (P) Random Logic Macros(RLM) Lots of fixed I/Os, few macros, millions of standard cells Placement densities : 40-80% (IBM) Flat and hierarchical designs SoCs Many more macro blocks, cores Datapaths + control logic Can have very low placement densities : < 40% Micro-Processor (P) Random Logic Macros(RLM) Hierarchical partitions are placement instances (5-30K) High placement densities : 80%-98% (low whitespace) Many fixed I/Os, relatively few standard cells

Requirements for Placers (1) Must handle 4-10M cells, 1000s macros 64 bits + near-linear asymptotic complexity Scalable/compact design database (OpenAccess) Accept fixed ports/pads/pins + fixed cells Place macros, esp. with var. aspect ratios Non-trivial heights and widths (e.g., height=2rows) Honor targets and limits for net length Respect floorplan constraints Handle a wide range of placement densities (from <25% to 100% occupied), ICCAD `02

Requirements for Placers (2) Add / delete filler cells and Nwell contacts Ignore clock connections ECO placement Fix overlaps after logic restructuring Place a small number of unplaced blocks Datapath planning services E.g., for cores Provide placement dialog services to enable cooperation across tools E.g., between placement and synthesis

Optimal Relative Order: B C

To spread ... A B C

.. or not to spread A B C

Place to the left A B C

… or to the right A B C

Optimal Relative Order: B C Without “free” space the problem is dominated by order

Placement Footprints: Standard Cell: Data Path: IP - Floorplanning

Placement Footprints: Core Control IO Reserved areas Mixed Data Path & sea of gates:

Placement Footprints: Perimeter IO Area IO

Unconstrained Placement

Floor planned Placement

VLSI Global Placement Examples bad placement good placement

Major Placement Techniques Simulated Annealing Timberwolf package [JSSC-85, DAC-86] Dragon [ICCAD-00] Partitioning-Based Placement Capo [DAC-00] Fengshui [DAC-2001] Analytical Placement Gordian [TCAD-91] Kraftwerk [DAC-98] FastPlace [ISPD-04] Hall’s Quadratic Placement Genetic Algorithm

Outline Wire length driven placement Main methods Simulated Annealing Gate-Array: Timberwolf package Standard-Cell: Timberwolf package, Dragon Partition-based methods Analytical methods Timing, congestion and other considerations Global placement (rough location) Detailed placement (legalization)

A down-to-the-earth method Clustering growth Select unplaced components and place them in slots SELECT: choose the unplaced component that is most strongly connected to all (or any single) of the placed component PLACE: place the selected component at a slot such that a certain “cost” of the partial placement is minimized Simple and fast: ideal for initial placement

Simulated Annealing Based Placement ( I ) “ The Timberwolf Placement and Routing Package”, Sechen, Sangiovanni; IEEE Journal of Solid-State Circuits, vol SC-20, No. 2(1985) 510-522 “Timber wolf 3.2: A New Standard Cell Placement and Global Routing Package” Sechen, Sangiovanni, 23rd DAC, 1986, 432-439 Timber wolf Stage 1 Modules are moved between different rows as well as within the same row modules overlaps are allowed when the temperature is reduced below a certain value, stage 2 begins Stage 2 Remove overlaps Annealing process continues, but only interchanges adjacent modules within the same row

Solution Space All possible arrangements of modules into rows possibly with overlaps overlaps

Neighboring Solutions Three types of moves: . M1: Displace a module to a new location M2: Interchange two modules M3: Change the orientation of a module Axis of reflections 1 2 2 1 1 2 3 4 3 4 3 4

Move Selection Timber wolf first try to select a move betwee M1 and M2 Prob(M1)=4/5 Prob(M2)=1/5 If a move of type M1 is chosen ( for certain module) and it is rejected, then a move of type M3 (for the same module) will be chosen with probability 1/10 Restriction on: How far a module can be displaced What pairs of modules can be interchanged M1: Displacement M2: Interchange M3: Reflection

Move Restriction Range Limiter At the beginning, R is very large, big enough to contain the whole chip Window size shrinks slowly as the temperature decreases. In fact, height and width of R  log(T) Stage 2 begins when window size are so small that no inter-row modules interchanges are possible Rectangular window R

Cost Function å C : ( a w + b h ) net i Y = C1+C2+C3 hi wi ai, bi are horizontal and vertical weights, respectively ai =1, bi =1 1/2 •perimeter of bounding box Critical nets: Increase both ai and bi Preferred metal layer routing: if vertical wirings are “cheaper” than horizontal wirings, we can use smaller vertical weights, i.e. bi< ai

Cost Function (Cont’d) C2: Penalty function for module overlaps O(i,j) = amount of overlaps in the X-dimension between modules i and j a — offset parameter to ensure C2  0 when T  0 å ( ) C = O ( i , j ) + a 2 2 i ¹ j C3: Penalty function that controls the row lengths Desired row length = d( r ) l( r ) = sum of the widths of the modules in row r å C = b l ( r ) - d ( r ) 3 r

Annealing Schedule Tk = r(k)•T k-1 k= 1, 2, 3, …. r(k) increase from 0.8 to max value 0.94 and then decrease to 0.1 At each temperature, a total number of K•n attempts is made n= number of modules K= user specified constant

Dragon2000: Standard-Cell Placement Tool for Large Industry Circuits M. Wang, X. Yang, and M. Sarrafzadeh, ICCAD-2000 pages 260-263

Main Idea Simulated annealing based Top-down hierarchical approach 1.9x faster than iTools 1.4.0 (commerical version of TimberWolf) Comparable wirelength to iTools (i.e., very good) Performs better for larger circuits Still very slow compared with than other approaches Also shown to have good routability Top-down hierarchical approach hMetis to recursively quadrisect into 4h bins at level h Swapping of bins at each level by SA to minimize WL Terminates when each bin contains < 7 cells Then swap single cells locally to further minimize WL Detailed placement is done by greedy algorithm

Outline Wire length driven placement Main methods Simulated Annealing Gate-Array: Timberwolf package Standard-Cell: Timberwolf package, Grover, Dragon Partition-based methods Analytical methods Timing and congestion consideration Newer trends

Partition based methods Partitioning methods FM Multilevel techniques, e.g., hMetis Two academic open source placement tools Capo (UCLA/UCSD/Michigan): multilevel FM Feng-shui (SUNY Binghamton): use hMetis Pros and cons Fast Not stable

Partitioning-based Approach Try to group closely connected modules together. Repetitively divide a circuit into sub-circuits such that the cut value is minimized. Also, the placement region is partitioned (by cutlines) accordingly. Each sub-circuit is assigned to one partition of the placement region. Note: Also called min-cut placement approach.

An Example Cutline Circuit Placement

Variations There are many variations in the partitioning-based approach. They are different in: The objective function used. The partitioning algorithm used. The selection of cutlines.

Given a set of interconnected blocks, produce two sets that Partitioning: Objective: Given a set of interconnected blocks, produce two sets that are of equal size, and such that the number of nets connecting the two sets is minimized.

FM Partitioning: list_of_sets = entire_chip; Initial Random Placement list_of_sets = entire_chip; while(any_set_has_2_or_more_objects(list_of_sets)) { for_each_set_in(list_of_sets) partition_it(); } /* each time through this loop the number of */ /* sets in the list doubles. */ After Cut 1 After Cut 2

FM Partitioning: Moves are made based on object gain. Object Gain: The amount of change in cut crossings that will occur if an object is moved from its current partition into the other partition -1 2 - each object is assigned a gain - objects are put into a sorted gain list - the object with the highest gain from the larger of the two sides is selected and moved. - the moved object is "locked" - gains of "touched" objects are recomputed - gain lists are resorted -1 -2 -2 -1 1 -1 1

FM Partitioning: -1 2 -1 -2 -2 -1 1 -1 1

-1 -2 -2 -2 -1 -2 -2 -1 1 -1 1

-1 -2 -2 -2 -1 -2 -2 -1 1 1 -1

-1 -2 -2 -2 -1 -2 -2 -1 1 1 -1

-1 -2 -2 -2 -1 -2 -2 -2 1 -1 -1 -1

-1 -2 -2 -2 -1 -2 -2 -2 1 -1 -1 -1

-1 -2 -2 -2 -1 -2 -2 -2 1 -1 -1 -1

-1 -2 -2 -2 1 -2 -2 -2 -2 1 -1 -1 -1

-1 -2 -2 -2 1 -2 -2 -2 -2 1 -1 -1 -1

-1 -2 -2 -2 1 -2 -2 -2 1 -2 -1 -1 -1

-1 -2 -2 -2 1 -2 -2 -1 -2 -2 -3 -1 -1

-1 -2 -2 1 -2 -2 -1 -2 -2 -2 -3 -1 -1

-1 -2 -2 1 -2 -2 -1 -2 -2 -2 -3 -1 -1

-1 -2 -2 -1 -2 -2 -2 -1 -2 -2 -2 -3 -1 -1

Quadrature Placement Procedure 1 3b 4a 2 4b Very suitable for circuits with high routing density in the centre.

Bisection Placement Procedure 1 3c 2b 3d 5a 4 5b 6a 6b 6c 6d Good for standard-cell placement.

Terminal Propagation Algorithm by Dunlop and Kernighan “A Procedure for Placement of Standard-Cell VLSI Circuits”, TCAD, 4(1):92-98, Jan. 1985.

Problem of Partitioning Subcircuits Cost of these 2 partitionings are not the same.

Terminal Propagation The Dummy Terminal will try Need to consider nets connecting to external terminals or other modules as well. Do partitioning in a breath-first manner (i.e., finish all higher-level partitioning first). The Dummy Terminal will try to pull B to the top partition. Dummy Terminal A A B A B B

Terminal Propagation

Creating Circuit Rows Terminal propagation reduce overall area by ~30% Creating rows Choose α and β preferably to balance row to balance row length (during re-arrangement )

Andrew Caldwell, Andrew Kahng, and Igor Markov DAC-2000 Can Recursive Bisection Alone Produce Routable Placement? (Name of placer: Capo) Andrew Caldwell, Andrew Kahng, and Igor Markov DAC-2000

Capo Overview Standard cell placement, Fixed-die context Pure recursive bisectioning placer Several minor techniques to produce good bisections Produce good results mainly because: Improvement in mincut bisection using multi-level idea in the past few years Pay attention to details in implementation Implementation with good interface (LEF/DEF and GSRC bookshelf) available on web

Capo Approach Recursive bisection framework: Multi-level FM for instances with >200 cells Flat FM for instances with 35-200 cells Branch-and-bound for instances with <35 cells Careful handling partitioning tolerance: Uncorking: Prevent large cells from blocking smaller cells to move Repartitioning: Several FM calls with decreasing tolerance Block splitting heuristics: Higher tolerance for vertical cut Hierarchical tolerance computation: Instance with more whitespace can have a bigger partitioning tolerance

- scales nearly linearly with problem size Partitioning: Pros: - very fast - great quality - scales nearly linearly with problem size Cons: - non-trivial to implement - very directed algorithm, but this limits the ability to deal with miscellaneous constraints - Not stable (if there is minor change)

Summary for Partition Based Placement Improvement in mincut partitioning are conducive to better wirelength and congestion Routable placements can be produced in most cases without explicit congestion management Explicit congestion control may still be useful in some cases Better weighted wirelength often implies better routed wirelength, but not always