Mixed Cell-Height Implementation for Improved Design Quality in Advanced Nodes Sorin Dobre +, Andrew B. Kahng * and Jiajia Li * * UC San Diego VLSI CAD.

Slides:



Advertisements
Similar presentations
OCV-Aware Top-Level Clock Tree Optimization
Advertisements

-1- VLSI CAD Laboratory, UC San Diego Post-Routing BEOL Layout Optimization for Improved Time- Dependent Dielectric Breakdown (TDDB) Reliability Tuck-Boon.
NTHU-CS VLSI/CAD LAB TH EDA De-Shiuan Chiou Da-Cheng Juan Yu-Ting Chen Shih-Chieh Chang Department of CS, National Tsing Hua University, Taiwan Fine-Grained.
An Efficient Technology Mapping Algorithm Targeting Routing Congestion Under Delay Constraints Rupesh S. Shelar Intel Corporation Hillsboro, OR Prashant.
Timing Margin Recovery With Flexible Flip-Flop Timing Model
Xing Wei, Wai-Chung Tang, Yu-Liang Wu Department of Computer Science and Engineering The Chinese University of HongKong
Minimum Implant Area-Aware Gate Sizing and Placement
Meng-Kai Hsu, Sheng Chou, Tzu-Hen Lin, and Yao-Wen Chang Electronics Engineering, National Taiwan University Routability Driven Analytical Placement for.
Ch.7 Layout Design Standard Cell Design TAIST ICTES Program VLSI Design Methodology Hiroaki Kunieda Tokyo Institute of Technology.
Paul Falkenstern and Yuan Xie Yao-Wen Chang Yu Wang Three-Dimensional Integrated Circuits (3D IC) Floorplan and Power/Ground Network Co-synthesis ASPDAC’10.
Ch.3 Overview of Standard Cell Design
UC San Diego / VLSI CAD Laboratory NOLO: A No-Loop, Predictive Useful Skew Methodology for Improved Timing in IC Implementation Tuck-Boon Chan, Andrew.
Toward Better Wireload Models in the Presence of Obstacles* Chung-Kuan Cheng, Andrew B. Kahng, Bao Liu and Dirk Stroobandt† UC San Diego CSE Dept. †Ghent.
Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.
Background: Scan-Based Delay Fault Testing Sequentially apply initialization, launch test vector pairs that differ by 1-bit shift A vector pair induces.
Power-Aware Placement
Reticle Floorplanning With Guaranteed Yield for Multi-Project Wafers Andrew B. Kahng ECE and CSE Dept. University of California San Diego Sherief Reda.
Architectural-Level Prediction of Interconnect Wirelength and Fanout Kwangok Jeong, Andrew B. Kahng and Kambiz Samadi UCSD VLSI CAD Laboratory
Supply Voltage Degradation Aware Analytical Placement Andrew B. Kahng, Bao Liu and Qinke Wang UCSD CSE Department {abk, bliu,
On Modeling and Sensitivity of Via Count in SOC Physical Implementation Kwangok Jeong Andrew B. Kahng.
On Legalization of Row-Based Placements Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La Jolla, CA 92093
Fast and Area-Efficient Phase Conflict Detection and Correction in Standard-Cell Layouts Charles Chiang, Synopsys Andrew B. Kahng, UC San Diego Subarna.
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
Design Bright-Field AAPSM Conflict Detection and Correction C. Chiang, Synopsys A. Kahng, UC San Diego S. Sinha, Synopsys X. Xu, UC San Diego A. Zelikovsky,
A Proposal for Routing-Based Timing-Driven Scan Chain Ordering Puneet Gupta 1 Andrew B. Kahng 1 Stefanus Mantik 2
Detailed Placement for Leakage Reduction Using Systematic Through-Pitch Variation Andrew B. Kahng †‡ Swamy Muddu ‡ Puneet Sharma ‡ CSE † and ECE ‡ Departments,
Layout-based Logic Decomposition for Timing Optimization Yun-Yin Lien* Youn-Long Lin Department of Computer Science, National Tsing Hua University, Hsin-Chu,
Topography-Aware OPC for Better DOF margin and CD control Puneet Gupta*, Andrew B. Kahng*†‡, Chul-Hong Park†, Kambiz Samadi†, and Xu Xu‡ * Blaze-DFM Inc.
SLIP 2000April 9, Wiring Layer Assignments with Consistent Stage Delays Andrew B. Kahng (UCLA) Dirk Stroobandt (Ghent University) Supported.
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
1 ENTITY test is port a: in bit; end ENTITY test; DRC LVS ERC Circuit Design Functional Design and Logic Design Physical Design Physical Verification and.
Enhanced Metamodeling Techniques for High-Dimensional IC Design Estimation Problems Andrew B. Kahng, Bill Lin and Siddhartha Nath VLSI CAD LABORATORY,
UC San Diego / VLSI CAD Laboratory Reliability-Constrained Die Stacking Order in 3DICs Under Manufacturing Variability Tuck-Boon Chan, Andrew B. Kahng,
-1- UC San Diego / VLSI CAD Laboratory Methodology for Electromigration Signoff in the Presence of Adaptive Voltage Scaling Wei-Ting Jonas Chan, Andrew.
Andrew B. Kahng‡†, Mulong Luo†, Siddhartha Nath†
Dose Map and Placement Co-Optimization for Timing Yield Enhancement and Leakage Power Reduction Kwangok Jeong, Andrew B. Kahng, Chul-Hong Park, Hailong.
Accuracy-Configurable Adder for Approximate Arithmetic Designs
-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.
A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.
Power Reduction for FPGA using Multiple Vdd/Vth
Global Routing.
UC San Diego / VLSI CAD Laboratory Toward Quantifying the IC Design Value of Interconnect Technology Improvement Tuck-Boon Chan, Andrew B. Kahng, Jiajia.
Horizontal Benchmark Extension for Improved Assessment of Physical CAD Research Andrew B. Kahng, Hyein Lee and Jiajia Li UC San Diego VLSI CAD Laboratory.
TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.
UC San Diego / VLSI CAD Laboratory Incremental Multiple-Scan Chain Ordering for ECO Flip-Flop Insertion Andrew B. Kahng, Ilgweon Kang and Siddhartha Nath.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
HDL-Based Layout Synthesis Methodologies Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI.
Kwangsoo Han, Andrew B. Kahng, Hyein Lee and Lutong Wang
Kwangsoo Han‡, Andrew B. Kahng‡† and Hyein Lee‡
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 6: Detailed Routing © KLMH Lienig 1 What Makes a Design Difficult to Route Charles.
NUMERICAL TECHNOLOGIES, INC. Assessing Technology tradeoffs for 65nm logic circuits D Pramanik, M Cote, K Beaudette Numerical Technologies Inc Valery Axelrad.
Outline Introduction: BTI Aging and AVS Signoff Problem
ECE 260B – CSE 241A /UCB EECS Kahng/Keutzer/Newton Physical Design Flow Read Netlist Initial Placement Placement Improvement Cost Estimation Routing.
UC San Diego / VLSI CAD Laboratory Learning-Based Approximation of Interconnect Delay and Slew Modeling in Signoff Timing Tools Andrew B. Kahng, Seokhyeong.
VLSI Floorplanning and Planar Graphs prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University July 2015VLSI Floor Planning and Planar.
Outline Motivation and Contributions Related Works ILP Formulation
-1- UC San Diego / VLSI CAD Laboratory On Potential Design Impacts of Electromigration Awareness Andrew B. Kahng, Siddhartha Nath and Tajana S. Rosing.
1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,
-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,
-1- Delay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products Samyoung Bang #, Kwangsoo Han ‡, Andrew B.
-1- UC San Diego / VLSI CAD Laboratory Optimal Reliability-Constrained Overdrive Frequency Selection in Multicore Systems Andrew B. Kahng and Siddhartha.
Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly
Improved Flop Tray-Based Design Implementation for Power Reduction
Kun Young Chung*, Andrew B. Kahng+ and Jiajia Li+
Kristof Blutman† , Hamed Fatemi† , Andrew B
Improved Performance of 3DIC Implementations Through Inherent Awareness of Mix-and-Match Die Stacking Kwangsoo Han, Andrew B. Kahng and Jiajia Li University.
Revisiting and Bounding the Benefit From 3D Integration
Measuring the Gap between FPGAs and ASICs
Presentation transcript:

Mixed Cell-Height Implementation for Improved Design Quality in Advanced Nodes Sorin Dobre +, Andrew B. Kahng * and Jiajia Li * * UC San Diego VLSI CAD Laboratory + Qualcomm Inc.

2 Outline Background and Motivation Problem Statement Related Work Our Methodology Experimental Setup and Results Conclusion

3 Choice of Cell Height ? In standard cell-based implementation … Large cell height  better timing, but larger area and power Small cell height  smaller area and power per gate, but large delay and more #buffers, pin accessibility issue Can we mix cell heights to have better tradeoffs between performance and area/power  better design QoR? Technology: 28nm LP RED: 12T cells = larger area, smaller delay BLUE: 8T cells = smaller area, larger delay

4 Post-synthesis area and timing comparison among 12T-only, 8T-only and mixed cell heights at 28LP 12T-only tends to have large area Weak drive strengths of 8T cells  #buffers↑  area↑ Mixing cell heights achieves >14% area reduction No existing flow offers sub-block level mixed cell-height optimization Our goal: mixed cell-height implementation (!) Motivation of Mixing Cell Heights

5 Outline Background and Motivation Problem Statement Related Work Our Methodology Experimental Setup and Results Conclusion

6 Mixed Cell-Height Placement Problem Given: design (i.e., gate-level netlist), timing constraints, Liberty files, and floorplan Place design such that each cell instance is legally placed in a row with corresponding height Objective: minimum design area with target performance

7 Challenge 1: “Chicken-Egg” Loop Heights of cell rows are defined by floorplan (site map) before placement Choices of cell heights highly depend on placement solution Synthesis Floorplan Placement CTS/Routing Conventional design flow Floorplan Placement

8 Challenge 2: Area Overheads “Breaker cells” must be inserted  area cost Vertical: P/G rail of a cell cannot encroach into adjacent-row cells Horizontal: well-to-well spacing rule Other layout constraints N-well sharing  even number of rows in a region Alignment with routing and poly tracks

9 “Breaker cells” must be inserted  area cost Vertical: P/G rail of a cell cannot encroach into adjacent-row cells Horizontal: well-to-well spacing rule Other layout constraints N-well sharing  even number of rows in a region Alignment with routing and poly tracks Challenge 2: Area Overheads Mixed cell-height implementation must comprehend these challenges

10 Outline Background and Motivation Problem Statement Related Work Our Methodology Experimental Setup and Results Conclusion

11 Related Works No previous work on sub-block level mixed cell-height design Similarity to voltage island placement Assign certain cell attribute with different values (height, Vdd) Area cost (breaker cells, level shifters) [Wu05][Ching06] propose partitioning methods to define power domains [Wu07] considers timing constraints and cell placement [Guo07] embeds voltage-island-aware optimization to partitioning-based placement More challenges in mixed cell-height design Chicken-and-egg loop Area impact of cell height choices

12 Outline Background and Motivation Problem Statement Related Work Our Methodology Experimental Setup and Results Conclusion

13 Overall Flow Logic Synthesis Initial placement Floorplan region definition Placement Legalization Floorplan Update Cell mapping Routing / RoutOpt / STA : with libraries of all cell heights : with revised cell LEF such that all cells have the same height (== min cell height), but scale cell width to maintain same area : based on the partition (floorplan definition) results, with cell rows of actual heights and breaker cells : with commercial P&R, STA tools 8T cell 12T cell

14 Example of Overall Optimization Flow Initial placement (8T/12T cells “freely” placed) Partitioning (Yellow blocks = regions) Legalization New floorplan Mixed-height placement Technology: 28nm LP Design: AES BLUE: 8T cells RED: 12T cells

15 Floorplan Partitioning and Region Definition Partition block into rectangular regions with specific cell heights Dynamic programming formulation cost(x 1, y 1, x 2, y 2 ) = total area of minority cells in region (x 1, y 1, x 2, y 2 ) for k := 1 to U for x 1 := 0 to M, y 1 := 0 to N, x 2 := x 1 +1 to M, y2 := y 1 +1 to N cost (x 1, y 1, x 2, y 2, k) = min (x1 ≤ x ≤ x2, y1 ≤ y ≤ y2, 0 < t < k) { cost(x 1, y 1, x, y 2, t) + cost(x, y 1, x 2, y 2, k-t-1) + breaker_cell_cost, // H cuts cost(x 1, y 1, x 2, y, t) + cost(x 1, y, x 2, y 2, k-t-1) + breaker_cell_cost } // V cuts endfor endfor return min (1 ≤ k ≤ U) cost(0, 0, M, N, k) Runtime complexity = O((M+N)(M∙N∙U) 2 ) 8T 12T 8T Example BLUE: 8T cells RED: 12T cells

16 Timing-Aware Placement Legalization Iterative optimization with two knobs Cell displacement (e.g., move 8T cells from 12T region to 8T region) Cell-height swapping (e.g., size 12T cells to 8T cells in 8T region) Optimization framework Optimizer  Displacement of cells  Swap cells across heights  Area recovery  Timing recovery Internal Timer  Gate delay: Liberty LUT  Gate slew: Liberty LUT  Wire delay: D2M  Wire slew: Peri Trial moves Slacks SoC Encounter  ECOs (displacement, gate sizing, placement legalization, trial routing)  Parasitic extraction  Timing analysis Timing slack Wire RC Routing overflow Moves (= displacement & gate sizing) Cell location TCL socket Optimizer  Displacement of cells  Swap cells across heights  Area recovery  Timing recovery SoC Encounter  ECOs (displacement, gate sizing, placement legalization, trial routing)  Parasitic extraction  Timing analysis Internal Timer  Gate delay: Liberty LUT  Gate slew: Liberty LUT  Wire delay: D2M  Wire slew: Peri

17 Iterative Optimization Flow

18 Mapping Cells to Original Heights/Widths Recall: In initial placement, cells have the same height (== minimum cell height), but scaled widths to maintain the same cell area In the updated floorplan (which have same area but different cell rows), we scale cells back to their original heights/widths and map them to updated cell rows Our method (illustrated on 2D mesh) Embed mesh graphs to larger aspect ratios based on [Ellis91] Maximum (new wirelength / original wirelength) = r + 1 / r (r = original cell height / minimum cell height, e.g., for 12T/8T case, r = 1.5) 1 1 h p / h N h N / h p  Original cell height (h P ) / min cell height (h N ) = 5/4  Partition area does not change  Due to scaled cell heights/widths a 5x4 mesh is embedded into a 4x5 mesh  Circled edge has the maximum WL increase 8T cell 12T cell

19 Cell Mapping Flow (General Cases) Map cells from original floorplan to updated cell rows Placement density on each updated row honors original average row density Mapping procedure for each row R in the updated floorplan while cell density on R is less than required density Map cells in i th row of original floorplan to updated rows // sort cells in increasing order of widths ++i endwhile place mapped cells ordered by their X-coordinates in original floorplan endfor Average wirelength penalty on 23 implementations of four designs is 0.8%

20 Outline Background and Motivation Problem Statement Related Work Our Methodology Experimental Setup and Results Conclusion

21 Experimental Setup Designs: AES, MPEG (from OpenCores website) Technology: 28nm LP, dual-VT, 8T/12T Tools Synthesis: Synopsys Design Compiler vH SP3 P&R & timing analysis: Cadence EDI System 14.1 Power analysis: Synopsys PT-PX vH SP2 Modeling breaker cell costs Horizontal cost: 0.544μm (= four placement sites) Vertical cost: 0.8μm (= eight M2 pitches)

22 Area/Performance Benefits from Mixing Cell Heights Pareto curves of power-area tradeoff Up to 25% area benefit over 12T designs Up to 20% performance benefit over 8T designs

23 Iso-performance power comparison with voltage scaling Re-optimize design if the supply voltage scaling > 30mV Similar power with single-height designs Mixed cell height dominates in area-frequency tradeoff (previous slide) Power penalty is not captured in our cost function Power Benefits from Mixing Cell Heights

24 Outline Background and Motivation Problem Statement Related Work Our Methodology Experimental Setup and Results Conclusion

25 Conclusion Novel physical design optimization flow to mix cells with different heights within a single place-and-route block Address the “chicken-and-egg” loop between floorplan site definition and the post-placement choice of cell heights Comprehend “breaker cells” area overhead and layout constraints Achieve 25% area reduction, while maintaining performance, compared to single-height design flows Future works Mixed cell-height clock tree synthesis flow More comprehensive cost function to trade off performance, power, area and wirelength

26 THANK YOU!