Chop-SPICE: An Efficient SPICE Simulation Technique For Buffered RC Trees Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.
EE 201A Modeling and Optimization for VLSI LayoutJeff Wong and Dan Vasquez EE 201A Noise Modeling Jeff Wong and Dan Vasquez Electrical Engineering Department.
OCV-Aware Top-Level Clock Tree Optimization
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
Advanced Interconnect Optimizations. Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack.
Ispd-2007 Repeater Insertion for Concurrent Setup and Hold Time Violations with Power-Delay Trade-Off Salim Chowdhury John Lillis Sun Microsystems University.
Fast Algorithms For Hierarchical Range Histogram Constructions
Yasuhiro Fujiwara (NTT Cyber Space Labs)
Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.
Improving Placement under the Constant Delay Model Kolja Sulimma 1, Ingmar Neumann 1, Lukas Van Ginneken 2, Wolfgang Kunz 1 1 EE and IT Department University.
Moon-Su Kim, Sunik Heo, DalHee Lee, DaeJoon Hyun, Byung Su Kim, Bonghyun Lee, Chul Rim, Hyosig Won, Keesup Kim Samsung Electronics Co., Ltd. System LSI.
SimPL: An Effective Placement Algorithm Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of Michigan 1ICCAD 2010, Myung-Chul Kim,
Constructing Minimal Spanning Steiner Trees with Bounded Path Length Presenter : Cheng-Yin Wu, NTUGIEE Some of the Slides in this Presentation are Referenced.
Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation Vishal Khandelwal and Ankur Srivastava Department of.
UC San Diego / VLSI CAD Laboratory NOLO: A No-Loop, Predictive Useful Skew Methodology for Improved Timing in IC Implementation Tuck-Boon Chan, Andrew.
An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem Juyeon Kim, Deokjin Joo, Taehan Kim DAC’13.
The Cost of Fixing Hold Time Violations in Sub-threshold Circuits Yanqing Zhang, Benton Calhoun University of Virginia Motivation and Background Power.
1 Accurate Power Grid Analysis with Behavioral Transistor Network Modeling Anand Ramalingam, Giri V. Devarayanadurg, David Z. Pan The University of Texas.
High-Performance Gate Sizing with a Signoff Timer
Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.
Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.
Layer Assignment Algorithm for RLC Crosstalk Minimization Bin Liu, Yici Cai, Qiang Zhou, Xianlong Hong Tsinghua University.
1 DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004 Presented by: Wei Chen.
Power-Aware Placement
Detecting Network Intrusions via Sampling : A Game Theoretic Approach Presented By: Matt Vidal Murali Kodialam T.V. Lakshman July 22, 2003 Bell Labs, Lucent.
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
Chung-Kuan Cheng†, Andrew B. Kahng†‡,
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
Toward Performance-Driven Reduction of the Cost of RET-Based Lithography Control Dennis Sylvester Jie Yang (Univ. of Michigan,
Assets and Dynamics Computation for Virtual Worlds.
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement.
Xin-Wei Shih and Yao-Wen Chang.  Introduction  Problem formulation  Algorithms  Experimental results  Conclusions.
-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.
1 Aggressive Crunching of Extracted RC Netlists Vasant Rao, Jeff Soreff, Ravi Ledalla (IBM EDA, Fishkill, NY), Fred Yang (IBM EDA, Almaden, CA)
CRISP: Congestion Reduction by Iterated Spreading during Placement Jarrod A. Roy†‡, Natarajan Viswanathan‡, Gi-Joon Nam‡, Charles J. Alpert‡ and Igor L.
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Static Timing Analysis and Gate Sizing.
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 EECS 527 Paper Presentation High-Performance.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
Kwangsoo Han, Andrew B. Kahng, Hyein Lee and Lutong Wang
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
1 Efficient Obstacle-Avoiding Rectilinear Steiner Tree Construction Chung-Wei Lin, Szu-Yu Chen, Chi-Feng Li, Yao-Wen Chang, Chia-Lin Yang National Taiwan.
1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National.
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
Exploiting Group Recommendation Functions for Flexible Preferences.
Multi-Split-Row Threshold Decoding Implementations for LDPC Codes
Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS
Introduction to Clock Tree Synthesis
UC San Diego / VLSI CAD Laboratory Learning-Based Approximation of Interconnect Delay and Slew Modeling in Signoff Timing Tools Andrew B. Kahng, Seokhyeong.
1ISPD'03 Process Variation Aware Clock Tree Routing Bing Lu Cadence Jiang Hu Texas A&M Univ Gary Ellis IBM Corp Haihua Su IBM Corp.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
A Design Flow for Optimal Circuit Design Using Resource and Timing Estimation Farnaz Gharibian and Kenneth B. Kent {f.gharibian, unb.ca Faculty.
DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong , Computer Science Department , UCLA Presented.
-1- Delay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products Samyoung Bang #, Kwangsoo Han ‡, Andrew B.
Algorithmic Tuning of Clock Trees and Derived Non-Tree Structures Igor L. Markov and Dong-Jin Lee University of Michigan Additional details in Dong-Jin.
A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.
Slack Analysis in the System Design Loop Girish VenkataramaniCarnegie Mellon University, The MathWorks Seth C. Goldstein Carnegie Mellon University.
Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.
הטכניון - מ.ט.ל. הפקולטה להנדסת חשמל - אביב תשס"ה
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Presentation transcript:

Chop-SPICE: An Efficient SPICE Simulation Technique For Buffered RC Trees Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of Michigan 1TAU 2011, Myung-Chul Kim, University of Michigan

Fast SPICE Simulation: Motivation ■IC timing closure, especially at advanced technology nodes, heavily depends on highly-accurate timing simulations −Increasing impact of PVT variation −Rigorous clock skew/slew constraints ■Circuit size and complexity rapidly increasing −Scalable SPICE technique is critical 2TAU 2011, Myung-Chul Kim, University of Michigan

Key Feature of Chop-SPICE ■Developed as a compromise simulator (fast yet sufficiently accurate) for use by Contango2 software in the ISPD 2010 contest ■Simple and practical divide-and-conquer approach ■Can capture PVT variation and spatial correlation ■Flexible trade-off between runtime and solution quality ■Adaptability to various SPICE simulators 3TAU 2011, Myung-Chul Kim, University of Michigan

ISPD10 Clock Tree Synthesis Contest ■45nm 2GHz CPU benchmarks from IBM and Intel ■Objective: Minimize the overall capacitance of the clock network −Subject to constraints: –Monte-Carlo SPICE simulations with PVT variations –Local clock skew < 7.5 ps –Slew rate < 100ps –Hard runtime limit per benchmark < 12 hours ■Low-skew clock trees are especially unforgiving to timing-analysis inaccuracies 4TAU 2011, Myung-Chul Kim, University of Michigan

Prior Work 5 TAU 2011, Myung-Chul Kim, University of Michigan ■Ideal Timing Evaluator −Fast runtime without sacrificing accuracy −High fidelity, adaptability to various SPICE tools Speed Accuracy Simulation Elmore, D2M, LnD Ideal Timing Evaluator SPICE, AWE Delay Models

Chop-SPICE Algorithm ■Definition: Probing Points −Given an RC tree, probing points are defined as A.Input nodes of buffers B.Sink nodes − = Set of probing points − = Number of fanouts to probing points at node s i ■Example 6TAU 2011, Myung-Chul Kim, University of Michigan

Chop-SPICE Algorithm ■Definition: Granularity −Maximum Granularity: −Minimum Granularity: −Granularity Range: −Target Granularity: ■Target Granularity determines minimum number of probing points to be included in sub-circuits 7TAU 2011, Myung-Chul Kim, University of Michigan

Chop-SPICE Flow 8 RC Tree instance RC Tree traversal yes no Invoke SPICE simulation Target granularity reached? RC tree exhausted? Sub-circuit generation Apply input slew stimuli Delay and slew propagation no yes Delay and slew update End

Sub-circuit Generation ■Sub-circuits are always delimited by buffers −If a probing point is an input node of buffer(s), all fanout buffers are explicitly included in current sub-circuit −Buffers at the boundary of a sub-circuit may also appear in another sub-circuit. ■Facilitating accurate reconstruction of circuit delay from sub-circuit simulation data ■Can reduce AC sweep time for sub-circuits 9TAU 2011, Myung-Chul Kim, University of Michigan

Delay Propagation ■Purpose : After retrieving probing points’ delay from SPICE, they can be propagated in order to capture delay for probing points in subsequent sub-circuits. ■Calculation of delay from the root node s 0 to node s j −Find the sub-circuit containing s j. −Identify the shortest tree path from s 0 to s j, and the earliest node s i in the sub-circuit that lies on this tree path (Assume that signal delay from s 0 to s i was computed recursively). −The delay from s i to s j is obtained by SPICE simulation and added to delay at s i. 10TAU 2011, Myung-Chul Kim, University of Michigan

Slew Propagation ■Purpose : After retrieving probing points’ slew from SPICE, they can be used in order to capture slew for probing points in subsequent sub-circuits. ■Slew at a given node can be expressed as a function of input slew of a sub-circuit. −Slew measured at the previous stage (up to the root node s i in a given sub-circuit) should be accounted for when stimuli for the current sub-circuit are generated. −Slew at a node is directly calculated by SPICE simulation. 11TAU 2011, Myung-Chul Kim, University of Michigan

Empirical Results: ISPD10 Benchmarks ■Experimental setup −Single threaded runs on a 3.2GHz Intel core i7 Quad CPU Q660 Linux workstation −Buffered RC networks generated by applying Contango2 to ISPD’10 high-performance CNS contest benchmark suite −Open-source NgSPICE-2.2 ■Target granularity −Varies from (full-scale SPICE simulation) to in order to examine trade-offs 12TAU 2011, Myung-Chul Kim, University of Michigan

Empirical Results: Avg. Error 13TAU 2011, Myung-Chul Kim, University of Michigan

Empirical Results: Max. Error and Trade-off 14

Fidelity ■Fidelity suggests whether Chop-SPICE is effective as a replacement of full-scale SPICE during optimization −On intermediate clock trees produced by Contango2, we use Chop-SPICE and full-scale SPICE to measure sink delays before and after optimization 15TAU 2011, Myung-Chul Kim, University of Michigan

Future work ■Extension to general RC networks −An algorithm for computing signal delays in non-tree RC networks by partitioning a given circuit into a spanning tree and non-tree links, and invoking an RC-tree computation is given [6] −A recent study [16] report 98% correlation to full SPICE runs. ■Using parallelism −Two sub-circuits can be simulated in parallel if they do not lie on the same path to root. −The larger the RC tree, the more parallelism can be found. 16TAU 2011, Myung-Chul Kim, University of Michigan

Conclusions ■Accurate estimation of circuit delay is becoming more difficult at new technology nodes −Clock-skew estimation in CNS requires picosecond precision ■Chop-SPICE partitions the original RC tree into sub-circuits, simulates each of them with SPICE, and reconstructs global results from simulation data for sub-circuits ■Empirical validation shows that Chop-SPICE offers attractive trade-offs between accuracy and runtime ■Chop-SPICE provides not only good accuracy, but also fidelity sufficient for use in external optimization algorithms ■Can be applied to any SPICE simulators 17TAU 2011, Myung-Chul Kim, University of Michigan

Questions and Answers Thank you! Time for Questions 18TAU 2011, Myung-Chul Kim, University of Michigan