Kristof Blutman† , Hamed Fatemi† , Andrew B

Slides:



Advertisements
Similar presentations
Constraint Driven I/O Planning and Placement for Chip-package Co-design Jinjun Xiong, Yiuchung Wong, Egino Sarto, Lei He University of California, Los.
Advertisements

OCV-Aware Top-Level Clock Tree Optimization
-1- VLSI CAD Laboratory, UC San Diego Post-Routing BEOL Layout Optimization for Improved Time- Dependent Dielectric Breakdown (TDDB) Reliability Tuck-Boon.
Native-Conflict-Aware Wire Perturbation for Double Patterning Technology Szu-Yu Chen, Yao-Wen Chang ICCAD 2010.
NTHU-CS VLSI/CAD LAB TH EDA De-Shiuan Chiou Da-Cheng Juan Yu-Ting Chen Shih-Chieh Chang Department of CS, National Tsing Hua University, Taiwan Fine-Grained.
Timing Margin Recovery With Flexible Flip-Flop Timing Model
Minimum Implant Area-Aware Gate Sizing and Placement
Meng-Kai Hsu, Sheng Chou, Tzu-Hen Lin, and Yao-Wen Chang Electronics Engineering, National Taiwan University Routability Driven Analytical Placement for.
Paul Falkenstern and Yuan Xie Yao-Wen Chang Yu Wang Three-Dimensional Integrated Circuits (3D IC) Floorplan and Power/Ground Network Co-synthesis ASPDAC’10.
UC San Diego / VLSI CAD Laboratory NOLO: A No-Loop, Predictive Useful Skew Methodology for Improved Timing in IC Implementation Tuck-Boon Chan, Andrew.
Toward Better Wireload Models in the Presence of Obstacles* Chung-Kuan Cheng, Andrew B. Kahng, Bao Liu and Dirk Stroobandt† UC San Diego CSE Dept. †Ghent.
Automated Layout and Phase Assignment for Dark Field PSM Andrew B. Kahng, Huijuan Wang, Alex Zelikovsky UCLA Computer Science Department
Power-Aware Placement
Supply Voltage Degradation Aware Analytical Placement Andrew B. Kahng, Bao Liu and Qinke Wang UCSD CSE Department {abk, bliu,
Fast and Area-Efficient Phase Conflict Detection and Correction in Standard-Cell Layouts Charles Chiang, Synopsys Andrew B. Kahng, UC San Diego Subarna.
NTHU-CS VLSI/CAD LAB TH EDA Student : Da-Cheng Juan Advisor : Shih-Chieh Chang Fine-Grained Sleep Transistor Sizing Algorithm for Leakage Power Minimization.
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
Design Bright-Field AAPSM Conflict Detection and Correction C. Chiang, Synopsys A. Kahng, UC San Diego S. Sinha, Synopsys X. Xu, UC San Diego A. Zelikovsky,
A Proposal for Routing-Based Timing-Driven Scan Chain Ordering Puneet Gupta 1 Andrew B. Kahng 1 Stefanus Mantik 2
Detailed Placement for Leakage Reduction Using Systematic Through-Pitch Variation Andrew B. Kahng †‡ Swamy Muddu ‡ Puneet Sharma ‡ CSE † and ECE ‡ Departments,
Triple Patterning Aware Detailed Placement With Constrained Pattern Assignment Haitong Tian, Yuelin Du, Hongbo Zhang, Zigang Xiao, Martin D.F. Wong.
1 ENTITY test is port a: in bit; end ENTITY test; DRC LVS ERC Circuit Design Functional Design and Logic Design Physical Design Physical Verification and.
Enhanced Metamodeling Techniques for High-Dimensional IC Design Estimation Problems Andrew B. Kahng, Bill Lin and Siddhartha Nath VLSI CAD LABORATORY,
UC San Diego / VLSI CAD Laboratory Reliability-Constrained Die Stacking Order in 3DICs Under Manufacturing Variability Tuck-Boon Chan, Andrew B. Kahng,
Dose Map and Placement Co-Optimization for Timing Yield Enhancement and Leakage Power Reduction Kwangok Jeong, Andrew B. Kahng, Chul-Hong Park, Hailong.
Escape Routing For Dense Pin Clusters In Integrated Circuits Mustafa Ozdal, Design Automation Conference, 2007 Mustafa Ozdal, IEEE Trans. on CAD, 2009.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.
A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.
UC San Diego / VLSI CAD Laboratory Toward Quantifying the IC Design Value of Interconnect Technology Improvement Tuck-Boon Chan, Andrew B. Kahng, Jiajia.
Horizontal Benchmark Extension for Improved Assessment of Physical CAD Research Andrew B. Kahng, Hyein Lee and Jiajia Li UC San Diego VLSI CAD Laboratory.
TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.
Wen-Hao Liu 1, Yih-Lang Li 1, and Kai-Yuan Chao 2 1 Department of Computer Science, National Chiao-Tung University, Hsin-Chu, Taiwan 2 Intel Architecture.
March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,
Low-Power Gated Bus Synthesis for 3D IC via Rectilinear Shortest-Path Steiner Graph Chung-Kuan Cheng, Peng Du, Andrew B. Kahng, and Shih-Hung Weng UC San.
UC San Diego / VLSI CAD Laboratory Incremental Multiple-Scan Chain Ordering for ECO Flip-Flop Insertion Andrew B. Kahng, Ilgweon Kang and Siddhartha Nath.
-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
Kwangsoo Han, Andrew B. Kahng, Hyein Lee and Lutong Wang
A Power Grid Analysis and Verification Tool Based on a Statistical Prediction Engine M.K. Tsiampas, D. Bountas, P. Merakos, N.E. Evmorfopoulos, S. Bantas.
Kwangsoo Han‡, Andrew B. Kahng‡† and Hyein Lee‡
Placement. Physical Design Cycle Partitioning Placement/ Floorplanning Placement/ Floorplanning Routing Break the circuit up into smaller segments Place.
Jason Cong‡†, Guojie Luo*†, Kalliopi Tsota‡, and Bingjun Xiao‡ ‡Computer Science Department, University of California, Los Angeles, USA *School of Electrical.
Register Placement for High- Performance Circuits M. Chiang, T. Okamoto and T. Yoshimura Waseda University, Japan DATE 2009.
Peng Du, Wenbo Zhao, Shih-Hung Weng, Chung-Kuan Cheng, Ronald Graham CSE Dept., University of California, San Diego, CA Character Design and Stamp Algorithms.
A Stable Fixed-outline Floorplanning Method Song Chen and Takeshi Yoshimura Graduate School of IPS, Waseda University March, 2007.
TSV-Constrained Micro- Channel Infrastructure Design for Cooling Stacked 3D-ICs Bing Shi and Ankur Srivastava, University of Maryland, College Park, MD,
Mixed Cell-Height Implementation for Improved Design Quality in Advanced Nodes Sorin Dobre +, Andrew B. Kahng * and Jiajia Li * * UC San Diego VLSI CAD.
Outline Motivation and Contributions Related Works ILP Formulation
-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,
An Exact Algorithm for Difficult Detailed Routing Problems Kolja Sulimma Wolfgang Kunz J. W.-Goethe Universität Frankfurt.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
RTL Design Flow RTL Synthesis HDL netlist logic optimization netlist Library/ module generators physical design layout manual design a b s q 0 1 d clk.
Improved Flop Tray-Based Design Implementation for Power Reduction
Prof. Yu-Chee Tseng Department of Computer Science
Kun Young Chung*, Andrew B. Kahng+ and Jiajia Li+
VLSI Physical Design Automation
Partial Reconfigurable Designs
Multi-Story Power Distribution Networks for GPUs
HeAP: Heterogeneous Analytical Placement for FPGAs
Improved Performance of 3DIC Implementations Through Inherent Awareness of Mix-and-Match Die Stacking Kwangsoo Han, Andrew B. Kahng and Jiajia Li University.
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts.
Revisiting and Bounding the Benefit From 3D Integration
Haim Kaplan and Uri Zwick
EE5780 Advanced VLSI Computer-Aided Design
Automated Layout and Phase Assignment for Dark Field PSM
Fast Min-Register Retiming Through Binary Max-Flow
Donghui Zhang, Tian Xia Northeastern University
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Presentation transcript:

Floorplan and Placement Methodology for Improved Energy Reduction in Stacked Power-Domain Design Kristof Blutman† , Hamed Fatemi† , Andrew B. Kahng‡, Ajay Kapoor† , Jiajia Li‡ and Jose Pineda de Gyvez† ‡UC San Diego, †NXP Semiconductors

Outline Background and Motivation Related Work Our Methodology Experimental Setup and Results Conclusion

Stacked Power-Domain Design Battery lifetime is critical to IC designs (e.g., IoT, mobile) Misalignment between battery and core voltages  Energy inefficiency in voltage regulator Improve power delivery efficiency  Battery lifetime ↑ Stacked power-domain design connects two domains with balanced currents in series  Aligned voltages between battery and core (Example: battery voltage = 2V, voltage of each domain = 1V)  Recycling of current between two domains 2VDD … Top domain … VR VDD VR … VDD Bottom domain VSS VSS Conventional design Stacked-domain design

Motivation Battery lifetime: Conventional vs. Stacked-domain designs Conventional design, Pbattery_c = Pcore / η Stacked-domain design, Pbattery_s = Pstack + PVR / η Assume same power consumption, i.e. Pcore = Pstack + PVR Battery lifetime ratio, Ts / Tc = Pbattery_c / Pbattery_s = (2 ∙ Istack + IVR) / (2 ∙ η ∙Istack + IVR) If current is perfectly balanced (IVR = 0), stacked domain provides 1.25X battery lifetime compared to conventional design, if η = 80% 2X battery lifetime compared to conventional design, if η = 50% Pbattery_c : Battery supply power to conventional design Pcore : Total power consumption of core η : Voltage regulator efficiency Pbattery_s : Supply power to stacked-domain design Pstack : Direct power of stacked domain from supply PVR : Input power of voltage regulator from supply Ts : Battery lifetime of stacked-domain design Tc : Battery lifetime of conventional design Istack / IVR : Currents corresponding to Pstack and PVR

Problem Formulation Given: Netlist, timing constraints, level shifter, voltage regulator efficiency, and power information Do: Partition netlist into two domains, define layout region of each domain, and place instances and level shifters Objective: Maximize battery lifetime Consideration: Balance current across multiple operating scenarios Region generation = Constrained layout Level shifter  Power, area, timing penalties

Outline Background and Motivation Related Work Our Methodology Experimental Setup and Results Conclusion

Related Works Stacked-domain implementation Netlist partitioning [Cabe10][Liu13] focus on specific designs (memory, IO) [Lee15] partition cores (with no connection) into domains [Blutman16] partition logic and memory into two domains No fully automated, general flow for stacked-domain design Netlist partitioning Move-based partitioning: KL, FM and many extensions Mathematical programming: [Shih93][Goemans95] Flow-based approach: [Yang96][Caldwell00] Do not directly apply to stacked-domain partitioning Our work: flow-based partitioning with multiple extensions for general stacked-domain partitioning

Outline Background and Motivation Related Work Our Methodology Experimental Setup and Results Conclusion

Overall Optimization Flow Trial placement : Estimate post-placement timing / current profile Flow-based partitioning : Aware of layout, timing, multi-scenario current balancing Domain region definition : One continuous region for each domain + minimized boundary length Re-floorplanning : Resize floorplan (if necessary) + power planning Level shifter insertion : Minimize wirelength penalty Placement optimization : Placement legalization + incremental timing fix

Example of Overall Flow Re-floorplanning + level shifter insertion (yellow = level shifters) Domain region definition (blue region = bottom domain, red region = top domain) Flow-based partitioning (blue = bottom domain, red = top domain) Design: AES (11K instances) Technology: 28LP

Flow-Based Partitioning Current-balanced partitioning with minimized #cuts Basic flow Construct flow-network (cell / cluster  vertex, net  edge, current of cell / cluster  weight) Perform max-flow optimization (Ford–Fulkerson) Cluster smaller partition together with one neighbor vertex Iterate Steps 2, 3 until the balancing constraint is satisfied a c e f d b a c e f d b a and b are source and sink All vertices have same weight c a, e f d b c a, e, f d b

Extensions to Flow-Based Partitioning Pre-clustering Reduce runtime and improve scalability Heavy-edge matching (HEM) to cluster cells with dense connections Multiple operating scenarios Balance current between two domains across different scenarios (e.g., function mode and sleep mode) Use weighted sum of normalized currents Critical-path awareness Remove V-shaped vertices after each max-flow opt Layout awareness Detect and remove outliers (= instances from large-current partition, but located within small-current region) after each max-flow opt HEM example AES, 28LP Top Bottom V-shaped vertices

Region Definition Separated regions for each power domain increase complexity for PDN (power delivery network) design  Want one continuous region for each power domain Uniformly divide block area into grids Define outliers and neighbors Outlier: Grid outside the largest continuous region of the same domain Neighbor: Grid adjacent to the largest continuous region of a different domain Calculate cost to swap pairs of (outlier, neighbor) Cost = HPWL (half-perimeter wirelength) increase due to the swap Select the pair with minimum cost to swap Iterate Steps 2-4 until there is no outlier Initial placement Yellow = outliers Green = neighbors Post-optimization placement

DP-Based Boundary Optimization Gap area must be inserted along the boundary between two domains  Want to minimize the boundary length Dynamic programming-based optimization Constraints = (1) area of each domain remains same, (2) moved instance area should be restricted Objective = minimize boundary length Index turning points from left to right, then DP formulation is Sol(j) = Min(Sol(i).length + seg(i, j).length), for all i < j Optimized segment (1, j) Optimized segment (1, i) Simplified segment (i, j) Initial boundary Optimized boundary

Level Shifter Insertion Resize floorplan for space of level shifter insertion Preserve trial placement solution Perform matching-based optimization to insert level shifters  Minimize wirelength penalty Enumerate candidate locations between two domains Cost = HPWL increase to place particular level shifter at a candidate location Re-floorplan + level shifter insertion (red = top domain, blue = bottom domain, yellow = level shifters)

Outline Background and Motivation Related Work Our Methodology Experimental Setup and Results Conclusion

Experimental Setup Designs Tools (for OpenCores designs) Four blocks from OpenCores website in 28nm LP AES (11K), DES (17K), JPEG (42K), VGA (58K) Dual-core M4 (113K) industrial design in 40nm Tools (for OpenCores designs) Synthesis: Synopsys Design Compiler vH-2013.12-SP3 P&R: Cadence Innovus Implementation System v16.1 Power/timing analysis: Cadence Innovus Implementation System v16.1 Power efficiency of voltage regulators Efficiency reduces with smaller current (mA)

Battery Lifetime Improvement Stacked domain achieve > 10% and 3X battery lifetime improvement in function and sleep modes Currents are well balanced in both modes Larger benefits in sleep mode is due to small regulator efficiency Normalized battery lifetime Function mode Sleep mode Design #LSs Currents (func) Currents (sleep) AES 166 4.83/5.04 0.05/0.04 DES 135 8.83/8.87 0.04/0.03 JPEG 673 19.43/23.39 0.16/0.16 VGA 520 28.69/32.65 0.20/0.17 M4 477 2.74/3.92 0.75/0.77 (top/bottom) domain current unit: mA

Sensitivity to Level Shifter Cost Use pessimistic level shifter model with ~ 3X power, area and delay increase Both delta current and total power increase due to power and delay overheads of level shifters Optimization minimizes #LSs  Overheads are not large ΔI (mA) Total power (mW) ΔI (μA) Total power (mW) Function mode Sleep mode Design: M4 Technology: 40nm

Sensitivity to Voltage Regulator Power Efficiency (mA) Better regulator efficiency  Smaller battery lifetime benefits from stacked domain Sleep mode Function mode

Outline Background and Motivation Related Work Our Methodology Experimental Setup and Results Conclusion

Conclusion Ongoing / Future works First comprehensive framework for stacked-domain optimization Extend flow-based partitioning with layout, timing-path awareness, and multi-scenario balancing More than 10% and 3X battery lifetime improvement compared to conventional designs in function and sleep modes Ongoing / Future works Predictive methodology to determine block size Optimization with > 2 domains and/or in 3DICs

Thank you!