Optimization Challenges in Transistor Sizing Chandu Visweswariah IBM Thomas J. Watson Research Center Yorktown Heights, NY Acknowledgments The entire.

Slides:



Advertisements
Similar presentations
Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
Advertisements

Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
INTRODUCTION TO MODELING
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.
Improving Placement under the Constant Delay Model Kolja Sulimma 1, Ingmar Neumann 1, Lukas Van Ginneken 2, Wolfgang Kunz 1 1 EE and IT Department University.
Progress in Linear Programming Based Branch-and-Bound Algorithms
Chop-SPICE: An Efficient SPICE Simulation Technique For Buffered RC Trees Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of.
Slide 4b.1 Stiff Structures, Compliant Mechanisms, and MEMS: A short course offered at IISc, Bangalore, India. Aug.-Sep., G. K. Ananthasuresh Lecture.
Ashish Gupta Under Guidance of Prof. B.N. Jain Department of Computer Science and Engineering Advanced Networking Laboratory.
Technology Mapping.
Supply Voltage Degradation Aware Analytical Placement Andrew B. Kahng, Bao Liu and Qinke Wang UCSD CSE Department {abk, bliu,
Efficient Methodologies for Reliability Based Design Optimization
Chung-Kuan Cheng†, Andrew B. Kahng†‡,
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.
High-Speed Circuit-Tuning Techniques Based on Lagrangian Relaxation Charlie Chung-Ping Chen (608)
ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.
D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.
Building a Strong Foundation for a Future Internet Jennifer Rexford ’91 Computer Science Department (and Electrical Engineering and the Center for IT Policy)
Logic Optimization Mohammad Sharifkhani. Reading Textbook II, Chapters 5 and 6 (parts related to power and speed.) Following Papers: –Nose, Sakurai, 2000.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Chapter 8 Architecture Analysis. 8 – Architecture Analysis 8.1 Analysis Techniques 8.2 Quantitative Analysis  Performance Views  Performance.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Lecture 12 – Design Procedure.
Discrete Gate Sizing CENG 5270 – Tutorial 9 WILLIAM CHOW.
CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.
Capturing Crosstalk-Induced Waveform for Accurate Static Timing Analysis Masanori Hashimoto, Yuji Yamada, Hidetoshi Onodera Kyoto University.
Power Reduction for FPGA using Multiple Vdd/Vth
1 The Optimization of High- Performance Digital Circuits Andrew Conn (with Michael Henderson and Chandu Visweswariah) IBM Thomas J. Watson Research Center.
Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal February 12, 2007 Inexact Methods for PDE-Constrained Optimization.
EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Static Timing Analysis and Gate Sizing.
Some Key Facts About Optimal Solutions (Section 14.1) 14.2–14.16
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 EECS 527 Paper Presentation High-Performance.
New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE Advanced Digital Systems Design Lecture 12 – Timing Analysis Capt Michael Tanner Room 2F46A HQ U.S. Air Force Academy I n t e g r i.
-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI.
ENM 503 Lesson 1 – Methods and Models The why’s, how’s, and what’s of mathematical modeling A model is a representation in mathematical terms of some real.
EDGE DETECTION IN COMPUTER VISION SYSTEMS PRESENTATION BY : ATUL CHOPRA JUNE EE-6358 COMPUTER VISION UNIVERSITY OF TEXAS AT ARLINGTON.
Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal January 31, 2007 Inexact Methods for PDE-Constrained Optimization.
© Chandu Visweswariah, 2004New Challenges in IC Design1 New Challenges in IC Design … with a focus on variability … SBCCI 2004 Panel Discussion Chandu.
Linear Programming Erasmus Mobility Program (24Apr2012) Pollack Mihály Engineering Faculty (PMMK) University of Pécs João Miranda
LatchPlanner:Latch Placement Algorithm for Datapath-oriented High-Performance VLSI Design Minsik Cho, Hua Xiang, Haoxing Ren, Matthew M. Ziegler, Ruchir.
TOPIC : Different levels of Fault model UNIT 2 : Fault Modeling Module 2.1 Modeling Physical fault to logical fault.
Is Statistical Timing Statistically Significant? DAC 2004, Panel Discussion, Session 41 Chandu Visweswariah IBM Thomas J. Watson Research Center Yorktown.
Dec 1, 2003 Slide 1 Copyright, © Zenasis Technologies, Inc. Flex-Cell Optimization A Paradigm Shift in High-Performance Cell-Based Design A.
Physical Synthesis Buffer Insertion, Gate Sizing, Wire Sizing,
McGraw-Hill/Irwin © The McGraw-Hill Companies, Inc., Table of Contents CD Chapter 14 (Solution Concepts for Linear Programming) Some Key Facts.
Xuanxing Xiong and Jia Wang Electrical and Computer Engineering Illinois Institute of Technology Chicago, Illinois, United States November, 2011 Vectorless.
Technology Mapping. 2 Technology mapping is the phase of logic synthesis when gates are selected from a technology library to implement the circuit. Technology.
Sp09 CMPEN 411 L18 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 16: Static Sequential Circuits [Adapted from Rabaey’s Digital Integrated Circuits,
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
High-Speed Circuit-Tuning Techniques Based on Lagrangian Relaxation Charlie Chung-Ping Chen ICCAD 99’ Embedded Tutorial Session 12A
Joshua L. Garrett Digital Circuits Design GroupUniversity of California, Berkeley Compact DSM MOS Modeling for Energy/Delay Estimation Joshua Garrett,
Global Clustering-Based Performance-Driven Circuit Partitioning Jason Cong University of California Los Angeles Chang Wu Aplus Design.
SEMI-SYNTHETIC CIRCUIT GENERATION FOR TESTING INCREMENTAL PLACE AND ROUTE TOOLS David GrantGuy Lemieux University of British Columbia Vancouver, BC.
A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.
1 Timing Closure and the constant delay paradigm Problem: (timing closure problem) It has been difficult to get a circuit that meets delay requirements.
Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)
Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 2: September 28, 2005 Covering.
On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Reducing Structural Bias in Technology Mapping
Chapter 6. Large Scale Optimization
Yiyu Shi*, Wei Yao*, Jinjun Xiong+ and Lei He*
EDA Lab., Tsinghua University
Is Statistical Timing Statistically Significant?
Chapter 6. Large Scale Optimization
Presentation transcript:

Optimization Challenges in Transistor Sizing Chandu Visweswariah IBM Thomas J. Watson Research Center Yorktown Heights, NY Acknowledgments The entire EinsTuner team at IBM, especially Andy Conn of the Applied Mathematics department at IBM Research

Two acts focusing on optimization Act I: what we have done with existing nonlinear optimization techniques –what is EinsTuner? –how does it work? –what is its impact? Act II: what we would love to do and currently cannot –numerical noise –capacity –robustness, scaling, weighting, degeneracy –recursive minimax support –mixed integer continuous problems

Act I: What is EinsTuner? A static, formal, transistor sizing tool geared towards custom high-performance digital circuits –static: based on static transistor-level timing analysis, so implicitly takes into account all paths through the logic –custom: timing based on real time-domain simulation of all possible transitions through each clump of strongly-connected transistors –formal: guaranteed optimal “local” according to engineers “global” according to mathematicians –transistor-level, therefore inherently flat –see DAC ’99, ICCAD ’99, SIAM J. Opt 9/99

Features of EinsTuner Extensive (sweeping and fine-grained) grouping, ratio-ing and no-touch commands 25K transistor capacity (  60K variables) Allows delay, area,  ratio, input loading, slew and transistor width constraints Allows minimization of delay, area or linear combination thereof Easy-to-use; full GUI support; good fit into (semi-custom and custom) methodologies Timer benefits such as accuracy for custom circuits, pattern-matching, state analysis

EinsTuner: formal static optimizer Embedded time- domain simulator SPECS Static transistor- level timer EinsTLT Nonlinear optimizer LANCELOT Transistor and wire sizes Function and gradient values

Components of EinsTuner Read netlist; create timing graph (EinsTLT) Formulate pruned optimization problem Feed problem to nonlinear optimizer (LANCELOT) Snap-to-grid; back-annotate; re-time Solve optimization problem, call simulator for delays/slews and gradients thereof Obtain converged solution Fast simulation and incremental sensitivity computation (SPECS) 1 2 3

Static optimization formulation

Digression: minimax optimization

Remapped problem

Springs and planks analogy

Algorithm demonstration: inv3

Algorithm demonstration: 1b ALU

i j Constraint generation

Statement of the problem

LANCELOT State-of-the-art general-purpose nonlinear optimizer Trust-region method Exploitation of group partial separability; augmented Lagrangian merit function Handles general linear/nonlinear equalities/ inequalities and objective functions Simple bounds handled by projections Several choices of preconditioners, exact/inexact BQP solution, etc.

LANCELOT algorithms Simple bounds Trust-region

Demonstration of degeneracy

Degeneracy!

Why do we need pruning?

Pruning: an example 1234

Block-based vs. path-based timing Block-based Path-based

Detailed pruning example

Detailed pruning example Sink12 15 Source Edges = 26 Nodes = 16 (+2)

Detailed pruning example Sink12 15 Source Edges = 26  20 Nodes = 16  10

Detailed pruning example Sink12 Source Edges = 20  17 Nodes = 10  7

Detailed pruning example Sink12 15 Source 14 1,7 2,7 3,7 Edges = 17  16 Nodes = 7  6

Detailed pruning example 9 11, Sink12 15 Source 14 1,7 2,7 3,7 Edges = 16  15 Nodes = 6  5

Detailed pruning example 9 11, ,16 Sink12 15 Source 14 1,7 2,7 3,7 Edges = 15  14 Nodes = 5  4

Detailed pruning example 9 11, ,13,16 Sink12 15 Source 14 1,7 2,7 3,7 Edges = 14  13 Nodes = 4  3

Detailed pruning example 9 11, ,13,16 Sink Source 10,12,14 12,15 10,12,15 12,14 1,7 2,7 3,7 Edges = 13  13 Nodes = 3  2

Pruning vs. no pruning

No time to mention... Simultaneous switching Tuning of transmission-gate MUXes Adjoint sensitivity computation The role of pattern matching Strength calculations/symmetrized stacks State analysis Detection of impossible problems Latches Uncertainty-aware tuning

EinsTuner impact Better circuits –15%+ on previously hand-tuned circuits! Improved productivity in conjunction with semi-custom design flow (see Northrop et al, DAC ’01) Better understanding of tradeoffs –lifts designers’ thinking to a higher level Currently being applied to third generation of z-series (S/390) and PowerPC microprocessors to benefit from EinsTuner

What we could do with a robust, fast, high-capacity, noise-immune, degeneracy-immune, easy-to-use, non-finicky nonlinear optimizer Act II: If only... What we are able to do Rich source of research problems

Open research problems Voltage Time Immunity to numerical noise

Numerical noise Need a formal basis for optimization in the presence of inevitable numerical noise Formal methods for measuring noise level Tie all tolerances to a multiple of noise level –tolerances for bounds –tolerances for updating multipliers, etc. –stopping criteria Currently our optimization is a race against noise and the conclusion is not pleasing! All optimization decisions noise-based

Numerical noise Inject various types of noise (correlated and uncorrelated) with different amplitude into function and gradient evaluation of analytic optimization problems Easy experimentation thus possible to test theories of noise- immune optimization

Capacity Today, our biggest problem has about 30,000 transistors, 70,000 variables and 70,000 constraints Runs for as long as a week! We go to great lengths to reduce problem size –in the tuner code –in the pre-processor –in our methodology

What higher capacity would permit Tuning of larger macros Early mode considerations Manufacturability –replicate constraints and variables at each process corner –rich possibilities in choice of objective function Explicit (circuit) noise constraints Benefits of explicit grouping constraints in post-optimality analysis

Digression: noise constraints Combine Harmony and EinsTuner –essential for dynamic circuits (see TCAD 6/00) –replace rules-of-thumb by actual noise constraints for static circuits v t NM L t1t1 t2t2 area = 0 tsts tete

Robustness Scaling –requires unit change in all variables to have roughly same order of magnitude of impact –need non-uniform trust-region; would subsume 2-step updating [Conn et al, SIAM J. Opt. 9/99] –very difficult in general Weighting –essential in any large-scale optimizer –need problem-size-dependent weights Sensitivity to choices should be minimized –finicky behavior not helpful in engineering application

Ease of use Nonlinear optimizers should stop assuming they are the “master” program There is no standard API for nonlinear optimization packages Aspects such as error/message handling and memory management are important; FORTRAN does not help Not easy to help the optimizer with domain- specific knowledge –e.g., availability of information about Lagrange multipliers in recursive minimax problems

Mixed integer/continuous problems Assignment of low threshold voltage devices Swapping pin order in multi-input gates (combinatorial problem) Working from a discrete gate library Discrete choice of taper ratios In general, the type of re-structuring that a logic synthesis program may consider

Recursive minimax support Need a way to force “timing correctness” –arrival times and slews at lowest feasible value –one of them tight at every stage –product of slacks = 0?

Degeneracy and redundancy Nonlinear optimizers easy to mislead Most engineering problems have plenty of inherent degeneracy and redundancy Can optimizers be made less sensitive to degeneracy? Example: is degenerate because

Practical considerations Detection of impossible problems –bound problems –obviously impossible constraints –not-so-obviously impossible constraints that need a little function evaluation to detect Post-optimality analysis –final values of Lagrange multipliers can be used to give the designer guidance on obtaining further improvement –may need to re-solve for multipliers at end –information needs to be collected and presented to designer in nice form –formulation of problem may change to take advantage of this analysis

Conclusion (Act I) Can robustly carry out static circuit tuning Consistent and high-quality results Improvement in designers’ productivity Moves designers’ thinking to a higher level Presently impacting third generation of z-series (S/390) and PowerPC microprocessors

Conclusion (Act II) Open problems in nonlinear optimization whose solution will have a real impact –numerical noise –capacity –robustness, scaling, weighting –ease of use, common API –discrete variables –support for recursive minimax problems –degeneracy –post-optimality analysis –detection of impossible problems