1 Aggressive Crunching of Extracted RC Netlists Vasant Rao, Jeff Soreff, Ravi Ledalla (IBM EDA, Fishkill, NY), Fred Yang (IBM EDA, Almaden, CA)

Slides:

Advertisements

Similar presentations

Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.

Advertisements

Topics Electrical properties of static combinational gates:

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.

Advanced Interconnect Optimizations. Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack.

1 Chapter 24--Examples. 2 Problem In the figure to the left, a potential difference of 20 V is applied across points a and b. a) What is charge on each.

Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.

ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.

1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.

Minimum Spanning Trees

Sequential Timing Optimization. Long path timing constraints Data must not reach destination FF too late s i + d(i,j) + T setup  s j + P s i s j d(i,j)

Chop-SPICE: An Efficient SPICE Simulation Technique For Buffered RC Trees Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of.

CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.

Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,

A Look at Chapter 4: Circuit Characterization and Performance Estimation Knowing the source of delays in CMOS gates and being able to estimate them efficiently.

1 Complexity of Network Synchronization Raeda Naamnieh.

Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.

Minimum Spanning Tree Algorithms

LSRP: Local Stabilization in Shortest Path Routing Hongwei Zhang and Anish Arora Presented by Aviv Zohar.

EE4271 VLSI Design Interconnect Optimizations Buffer Insertion.

1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.

Computing Trust in Social Networks

Lecture #24 Gates to circuits

Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.

Dynamic Sets and Data Structures Over the course of an algorithm’s execution, an algorithm may maintain a dynamic set of objects The algorithm will perform.

LPT for Data Aggregation in Wireless Sensor networks Marc Lee and Vincent W.S Wong Department of Electrical and Computer Engineering, University of British.

UCSD CSE 245 Notes – SPRING 2006 CSE245: Computer-Aided Circuit Simulation and Verification Lecture Notes 4 Model Order Reduction (2) Spring 2006 Prof.

Bluenet a New Scatternet Formation Scheme * Huseyin Ozgur Tan * Zifang Wang,Robert J.Thomas, Zygmunt Haas ECE Cornell Univ*

1 Topology Design of Structured Campus Networks by Habib Youssef Sadiq M. SaitSalman A. Khan Department of Computer Engineering King Fahd University of.

Carmine Cerrone, Raffaele Cerulli, Bruce Golden GO IX Sirmione, Italy July

Additional Questions (Direct Current Circuits)

1 Topology Design of Structured Campus Networks by Habib Youssef Sadiq M. SaitSalman A. Khan Department of Computer Engineering King Fahd University of.

Heapsort CIS 606 Spring Overview Heapsort – O(n lg n) worst case—like merge sort. – Sorts in place—like insertion sort. – Combines the best of both.

GS 3 GS 3 : Scalable Self-configuration and Self-healing in Wireless Networks Hongwei Zhang & Anish Arora.

Dynamic Sets and Data Structures Over the course of an algorithm’s execution, an algorithm may maintain a dynamic set of objects The algorithm will perform.

Escaping local optimas Accept nonimproving neighbors – Tabu search and simulated annealing Iterating with different initial solutions – Multistart local.

Primal-Dual Meets Local Search: Approximating MST’s with Non-uniform Degree Bounds Author: Jochen Könemann R. Ravi From CMU CS 3150 Presentation by Dan.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)

1 TCOM 5143 Telecommunications Analysis, Planning and Design Lecture 6 Network Design and Graph Theory: part 2 Shortest Path trees and Tours.

Efficient Gathering of Correlated Data in Sensor Networks

EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Static Timing Analysis and Gate Sizing.

Huffman Encoding Veronica Morales.

Unit III : Introduction To Data Structures and Analysis Of Algorithm 10/8/ Objective : 1.To understand primitive storage structures and types 2.To.

A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.

Thermal-aware Steiner Routing for 3D Stacked ICs M. Pathak and S.K. Lim Georgia Institute of Technology ICCAD 07.

Energy-Efficient Monitoring of Extreme Values in Sensor Networks Loo, Kin Kong 10 May, 2007.

Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.

Minimal Spanning Tree Problems in What is a minimal spanning tree An MST is a tree (set of edges) that connects all nodes in a graph, using.

SRL: A Bidirectional Abstraction for Unidirectional Ad Hoc Networks. Venugopalan Ramasubramanian Ranveer Chandra Daniel Mosse.

Modern VLSI Design 4e: Chapter 3 Copyright  2008 Wayne Wolf Topics n Wire delay. n Buffer insertion. n Crosstalk. n Inductive interconnect. n Switch logic.

Xuanxing Xiong and Jia Wang Electrical and Computer Engineering Illinois Institute of Technology Chicago, Illinois, United States November, 2011 Vectorless.

Direct Methods for Sparse Linear Systems Lecture 4 Alessandra Nardi Thanks to Prof. Jacob White, Suvranu De, Deepak Ramaswamy, Michal Rewienski, and Karen.

Routing Tree Construction with Buffer Insertion under Obstacle Constraints Ying Rao, Tianxiang Yang Fall 2002.

Two Connected Dominating Set Algorithms for Wireless Sensor Networks Overview Najla Al-Nabhan* ♦ Bowu Zhang** ♦ Mznah Al-Rodhaan* ♦ Abdullah Al-Dhelaan*

Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Wire delay. n Buffer insertion. n Crosstalk. n Inductive interconnect.

Self-stabilizing energy-efficient multicast for MANETs.

Static Timing Analysis

Inductance Screening and Inductance Matrix Sparsification 1.

Great Theoretical Ideas in Computer Science for Some.

Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!

A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.

An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.

An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,

Chapter 9 CAPACITOR.

COE 360 Principles of VLSI Design Delay. 2 Definitions.

Presented by Edith Ngai MPhil Term 3 Presentation

Key Stage ic Using a (555 IC) as a Monostable / Astable Circuit 555 IC Live Wire / PCB Wizard - (555 IC) Circuit RA Moffatt.

Computer Science cpsc322, Lecture 14

Minimum Spanning Tree.

Inductance Screening and Inductance Matrix Sparsification

Performance-Driven Interconnect Optimization Charlie Chung-Ping Chen

Presentation transcript:

1 Aggressive Crunching of Extracted RC Netlists Vasant Rao, Jeff Soreff, Ravi Ledalla (IBM EDA, Fishkill, NY), Fred Yang (IBM EDA, Almaden, CA)

2 Agenda Motivation for RC Crunching Internal Node Elimination (TICER) Resistor Short/Update (TICER+) Examples Results

3 Motivation for RC Crunching Netlists generated by Circuit Extractors have far too many resistors which slow down Circuit Simulation significantly  Size of the netlist is huge Large Circuit Matrices  Wide range of dynamic time-constants due to wide range of resistor values causes time-step control problems

4 RC Crunching Goals Crunch Extracted RC netlist down significantly  reduce size (number of nodes/resistors)  preserve sparsity  preserve total capacitance  give user a size vs accuracy control knob size of crunched network should vary inversely with error user is willing to tolerate. If user does not care for accuracy, the crunched network should be a single node with no resistors.  Should have potential for Complete Crunching

5 Internal Node Elimination (TICER) Eliminate Node N With Capacitance C Conductance: C g4g4 g3g3 g2g2 g1g N Merge parallel resistors & capacitors C3C3 g 12 g 13 g 23 g 14 g 24 g C2C2 C4C4 C1C1 TICER: B. N. Sheehan, ICCAD-1999

6 TICER Properties Eliminates only internal ( not source/sink ) node.  Preserves Elmore Delay.  Handles Coupling Capacitors TICER eliminates internal nodes with: After elimination of a node of degree k:  Node count reduces by 1.  Resistors increase by fill-in count = Restrict to preserve sparsity User Defined Threshold Equilibrium Time Constant #New R’s among neighbors #Old R’s among neighbors #Deleted R’s

7 Resistor Short/Update (TICER+) TICER does not eliminate sources/sinks. Fill-in count restriction to preserve sparsity conflicts with complete crunching goal. TICER+ consists of:  First run TICER with threshold  and fill-in limit  Recommend  = 0.  Then short certain resistors and (possibly) update values of neighboring resistors Work with Elmore delay (satisfies additive relations) Limit accumulated delay error <  /10.

8 First consider RC-Tree: Root I A B K J R RKRK RJRJ RIRI Notation: Delay from Root to Node X before Shorting R Delay from Root to Node X after Shorting R Cumulative Down-stream Capacitance at X. Additive Relations

9 After shorting R between A and B: Root I AB K J RK+KRK+K RJ+JRJ+J RI+IRI+I

10 Optimal Solution: Update ONLY neighbors R J of R connected to B: This results in Note:  Cannot preserve Elmore Delays at each sink  Delay error occurs at the merged node only  No error for sinks at A. Only error for sink at B.  All perturbations are positive - good. Perturb resistors to minimize error due to shorting resistor R: Optimization Problem No Update Needed if B is a leaf Coupling Capacitors Handled

11 Overall TICER+ Crunching Algorithm 1. Run TICER with user-defined  and  a. First only internal nodes with degree 1 or 2. b. Then restrict to fill-in count of . 2. Find Minimum (Resistive) Spanning Tree 3. Pick leaf R with smallest 4. Short R and accumulate Error at merged node. 5. Check if total accumulated Error is 6. Repeat step 3 until above check fails. No update needed since R is a leaf

12 Example1 A B C D E F G H I S 1 source S 9 sinks A-I Sink Cap = 10fF Internal Pin Cap = 1fF All R’s = 1  User sets  = 1ps Initially delay error  =0 at all nodes. RC-Tree after TICER with  = 0 14 nodes 13 resistors Cannot Eliminate any Internal Node 10fF 11

13 A B C D E F G H I S After 1 short fs

14 ABC D E F G H I S After 3 shorts fs

15 ABC DEF GHI S After 9 shorts 0 10fs 31fF 11

16 ABC DEF GHI S After 10 shorts 41fs 10fs

17 ABCDEFGHI S After 12 shorts 41fs Final Network: 2 nodes 1 resistor Cap = 94fF Maximum delay error is 41fs <   = 100fs. Further shorting will result in a delay error = *94 = 135fs >   = 100fs 11

18 Example User sets:  = 8ps  = 0 65 resistors 64 nodes 2 loops

resistors 58 nodes 0 loops

Done with internal nodes with 2 or less resistive neighbors. Now work on internal nodes with 3 or more resistive neighbors. No loops!!! 30 resistors 31 nodes 0 loops

Loop Formed

resistors 23 nodes Internal Node Elimination (TICER) phase completed. Further elimination will increase resistor count (cause fill-ins)

resistors 23 nodes 8 links Begin Resistor Short/Update Phase: Find Minimum Resistor Spanning Tree and select Root Root 0.66ps  /10 = 0.8ps

resistors 22 nodes 7 links Root 0.66ps 0.62ps 0.22ps 0.67ps 0.22ps 0.73ps 0.66ps

resistors 9 nodes 3 links Root 0.66ps 0.67ps 0.66ps ps

resistors 9 nodes 3 links Root 0.66ps 0.67ps +0.13ps 0.67ps 0.66ps +0.02ps ps

resistors 5 nodes 1 link Root 0.8ps 0.68ps 0.67ps 0.73ps 57 Any further shorting will violate 0.8ps delay error bound End of Shorting Phase: Final RC Network after Crunching - Note that resistor update formula not used.

30 Results TICER+ implemented in Transistor-level Static Timing Analyser (EinsTLT) used by IBM in production.  EinsTLT uses a fast simulator (ACES) TICER+ performance measured by run-time savings in EinsTLT TICER+ accuracy measured by sink-to-sink stage- delay (d) difference (  ):  computed by EinsTLT/ACES NOT Elmore Delay RC d

31 Threshold of TICER+ controls Run-Time vs Accuracy of EinsTLT 0  No Crunching 1.0ns  Complete Crunching a Recommended Thresholds

32 Just TICER by itself is not good enough: Size saturates too soon at fixed fill-in number Increasing fill-in number:  increases resistors significantly  reduces nodes slightly