Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007.

Slides:

Advertisements

Similar presentations

Steady-state heat conduction on triangulated planar domain May, 2002

Advertisements

Formal Computational Skills

Explicit Gate Delay Model for Timing Evaluation Muzhou Shao : University of Texas at Austin D.F.Wong : U. of Illinois at Urbana- Champaign Huijing Cao.

Variation Aware Gate Delay Models Dinesh Ganesan.

Chapter 7 Operational-Amplifier and its Applications

Lightning Effects and Structure Analysis Tool (LESAT) Steve Peters

Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,

A Discrete Adjoint-Based Approach for Optimization Problems on 3D Unstructured Meshes Dimitri J. Mavriplis Department of Mechanical Engineering University.

Computer Science & Engineering Department University of California, San Diego SPICE Diego A Transistor Level Full System Simulator Chung-Kuan Cheng May.

CSE245: Computer-Aided Circuit Simulation and Verification Lecture Note 5 Numerical Integration Prof. Chung-Kuan Cheng 1.

Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?

1 Accurate Power Grid Analysis with Behavioral Transistor Network Modeling Anand Ramalingam, Giri V. Devarayanadurg, David Z. Pan The University of Texas.

CSE 245: Computer Aided Circuit Simulation and Verification Fall 2004, Nov Nonlinear Equation.

CSE245: Computer-Aided Circuit Simulation and Verification Lecture Note 2: State Equations Prof. Chung-Kuan Cheng 1.

CSE245: Computer-Aided Circuit Simulation and Verification Lecture Note 5 Numerical Integration Spring 2010 Prof. Chung-Kuan Cheng 1.

An Impulse-Response Based Methodology for Modeling Complex Interconnect Networks Zeynep Dilli, Neil Goldsman, Akın Aktürk Dept. of Electrical and Computer.

1 EE 616 Computer Aided Analysis of Electronic Networks Lecture 12 Instructor: Dr. J. A. Starzyk, Professor School of EECS Ohio University Athens, OH,

CSE245:Lec4 02/24/2003. Integration Method Problem formulation.

Steady Aeroelastic Computations to Predict the Flying Shape of Sails Sriram Antony Jameson Dept. of Aeronautics and Astronautics Stanford University First.

Solutions for Nonlinear Equations

1 EE 616 Computer Aided Analysis of Electronic Networks Lecture 12 Instructor: Dr. J. A. Starzyk, Professor School of EECS Ohio University Athens, OH,

High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.

SAMSON: A Generalized Second-order Arnoldi Method for Reducing Multiple Source Linear Network with Susceptance Yiyu Shi, Hao Yu and Lei He EE Department,

UCSD CSE245 Notes -- Spring 2006 CSE245: Computer-Aided Circuit Simulation and Verification Lecture Notes Spring 2006 Prof. Chung-Kuan Cheng.

An Algebraic Multigrid Solver for Analytical Placement With Layout Based Clustering Hongyu Chen, Chung-Kuan Cheng, Andrew B. Kahng, Bo Yao, Zhengyong Zhu.

Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.

Ordinary Differential Equations (ODEs)

More Realistic Power Grid Verification Based on Hierarchical Current and Power constraints 2 Chung-Kuan Cheng, 2 Peng Du, 2 Andrew B. Kahng, 1 Grantham.

1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.

Ch 8.1 Numerical Methods: The Euler or Tangent Line Method

ECE 546 – Jose Schutt-Aine 1 ECE 546 Lecture -13 Latency Insertion Method Spring 2014 Jose E. Schutt-Aine Electrical & Computer Engineering University.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

CSE245: Computer-Aided Circuit Simulation and Verification Lecture Note 2: State Equations Prof. Chung-Kuan Cheng.

WavePipe: Parallel Transient Simulation of Analog and Digital Circuits on Multicore Shared-Memory Machines Wei Dong, Peng Li, Xiaoji Ye Department of ECE,

Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego

Indian Institute of Technology Bombay 1 SEQUEL: A Solver for circuit EQuations with User-defined ELements Prof. Mahesh B. Patil

Efficient Integration of Large Stiff Systems of ODEs Using Exponential Integrators M. Tokman, M. Tokman, University of California, Merced 2 hrs 1.5 hrs.

A Power Grid Analysis and Verification Tool Based on a Statistical Prediction Engine M.K. Tsiampas, D. Bountas, P. Merakos, N.E. Evmorfopoulos, S. Bantas.

CSE 245: Computer Aided Circuit Simulation and Verification Instructor: Prof. Chung-Kuan Cheng Winter 2003 Lecture 1: Formulation.

Tarek A. El-Moselhy and Luca Daniel

1.4. The Source-Free Parallel RLC Circuits

+ Numerical Integration Techniques A Brief Introduction By Kai Zhao January, 2011.

On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.

Boundary Value Problems l Up to this point we have solved differential equations that have all of their initial conditions specified. l There is another.

Large Timestep Issues Lecture 12 Alessandra Nardi Thanks to Prof. Sangiovanni, Prof. Newton, Prof. White, Deepak Ramaswamy, Michal Rewienski, and Karen.

Distributed Computation: Circuit Simulation CK Cheng UC San Diego

Circuits Theory Examples Newton-Raphson Method. Formula for one-dimensional case: Series of successive solutions: If the iteration process is converged,

1 EE 616 Computer Aided Analysis of Electronic Networks Lecture 12 Instructor: Dr. J. A. Starzyk, Professor School of EECS Ohio University Athens, OH,

CSE245: Computer-Aided Circuit Simulation and Verification Lecture Note 2: State Equations Spring 2010 Prof. Chung-Kuan Cheng.

Inductance Screening and Inductance Matrix Sparsification 1.

Circuit Simulation using Matrix Exponential Method Shih-Hung Weng, Quan Chen and Chung-Kuan Cheng CSE Department, UC San Diego, CA Contact:

SPICE Diego : Circuit Simulation for Post Layout Analysis Chung-Kuan Cheng Department of Computer Science and Engineering University of California, San.

Dept. of Electronics Engineering & Institute of Electronics National Chiao Tung University Hsinchu, Taiwan ISPD’16 Generating Routing-Driven Power Distribution.

J.PRAKASH.  The term power quality means different things to different people.  Power quality is the interaction of electronic equipment within the.

Lecture 11 Alessandra Nardi

Ioannis E. Venetis Department of Computer Engineering and Informatics

B.Sc. Thesis by Çağrı Gürleyük

Optimum Dispatch of Capacitors in Power Systems

CSE 245: Computer Aided Circuit Simulation and Verification

CSE245: Computer-Aided Circuit Simulation and Verification

CSE 245: Computer Aided Circuit Simulation and Verification

CSE245: Computer-Aided Circuit Simulation and Verification

Latency Insertion Method

Thermal-ADI: a Linear-Time Chip-Level Dynamic Thermal Simulation Algorithm Based on Alternating-Direction-Implicit(ADI) Method Good afternoon! The topic.

Yiyu Shi*, Wei Yao*, Jinjun Xiong+ and Lei He*

Inductance Screening and Inductance Matrix Sparsification

CSE245: Computer-Aided Circuit Simulation and Verification

Xiou Ge Motivation PDN Simulation in LIM Real Example Results

Transient Analysis of Power System

EE 616 Computer Aided Analysis of Electronic Networks Lecture 12

Presentation transcript:

Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007

Outline Research Directions Simulation test case results Overview of Simulation Commercial Package Alternating direction implicit (ADI) Method General Operator Splitting Method Distributed Computing Conclusions and Future Works

Research Directions Simulation: SPICE, STA Network on Chip: topology and wire styles, Power, and Clock Networks Data Path Components: adders, shifters, multipliers, division Packaging: passive distortion compensation

6x6 Bump Simulation Results The Circuit: –184K Capacitors, 17K Current Sources, 120K Inductors and 246K Resistors. –306K Nodes Accuracy: –Waveform and measurement results match Fujitsu ’ s with less than 0.002% error. Runtime / Memory Comparison: CPU_TimeMemoryComputer Used UCSD678s600.2MPentium 4 3.2G, Linux Fujistu Log File1845s771Munknown

6x6 Bump Simulation Results Measurement results and waveform Min_pwr_l_est_ Min_ Min_ UCSD Fujistu Log File Error0.002%0.004%0.005% (Red curve is UCSD result)

703KR Simulation Results The Circuit: –514K Capacitors, 76K Current Sources, 370K Inductors and 703K Resistors. –1.3M Nodes Accuracy: –Measurement results match Fujitsu ’ s with less than 0.02% error. Runtime / Memory Comparison: CPU_TimeMemoryComputer Used UCSD2575s (0.7h)1.7GPentium 4 3.2G, Linux Fujistu Log File864561s (240h)2.28Gunknown

703KR Simulation Results Measurement results and waveform Min_ Min_ Min_ UCSD Fujistu Log File Error0.015%0.02%0.026% (UCSD results only. Fujitsu waveform is not available for comparison)

Further Speed-ups Reduce iteration count by 50% for pure linear circuits (like 6x6 bump and 703KR) –2x speed up More effective time step control –DVDT, breakpoint, truncation error x speed up Use Multigrid solver – x speed up for medium circuits (6x6 bump) –2x – 10x speed up for large circuits (703KR) Parallel simulation –4 or more processors on linux cluster –32 to hundreds of processors on supercomputer. Overall speed-up –6x - 60x speed up without parallel simulation –12x x speed up with parallel simulation

Performance and capacity prediction Cases 10x-100x larger than 703KR. Preferred SolverCpu TimeMemory Small - Medium 0.3M nodes LU Decomposition11 minutes600M Medium - Large 1.3M nodes Multigrid43 minutes1.7G Huge 10–100 M nodes Multigrid + Parallel 5 – 100 hours15G - 200G

Overview of Simulation Our research Fast speed with SPICE accuracy Nonlinear devices Efficient matrix solvers Effective integration methods Time step controls according to different integration methods Distributed computing Yes Load Circuit Device Evaluation LU Decomposition N-R Converge? Next Time Point Time Step Control Integration Approximation Linearization No

Overview of Simulation Matrix Solver LU Decomposition Iterative Approach Integration Time Step Control ADI Nonlinear Devices Two Stage Newton Raphson Distributed Computing Commercial Implementation

Overview of Simulation Integration Time Step Control ADI (two-way partitioning) Operator Splitting (multi-way) Distributed Computing MPI Partitioning Three Ph.D. Students

Commercial Package: Fastrack Design Founded in January 2001 Headquartered in San Jose Privately funded, cash-flow positive Two Business Units Design Services Technology Products

Analog Designs Design # Elements Sim. Len HSpicemSPICESPEEDUPFACTOR LVDS us80h26h 3.1X Oscillator2221 ms13,706s2,670s 5.1X Biasing Circuit ns427s82s 5.2X PLL us67d12d 5.6X PLL (post-layout) 300K40us290d (est)16d 18.1X

Digital Blocks DesignNameDevicesRuntime Speedup Factor MOSRCmSPICE Traditional Spice ALU10.1k12.7k7.5k6.9m7m1.0X CONTROL69k83.7k52.5k1.5h9.5h6.3X YN_BLK205K242.8k203.9k3.5h> 2d>13.7X THP437k499.3k313.5k5.0hCOULD NOT RUN ∞ VCON936k753k561k15.0hCOULD NOT RUN ∞

Memory Blocks Design#Tr#R#C # Vectors / Sim. Length mSPICE Run Time BRAM (pre)220K hours SRAM (pre) 8Kx8 SP 410K0027 hours eRAM (post) 256x16 72K28K427K48ns8 hours BRAM (post)220K1320K870K218 hours 100% accurate Spice simulation

mSPICE-Parallel Industry’s first practical parallel Spice simulation solution –Increases capacity further –Dramatically improves throughput Uses Matrix Level Partitioning –No loss of accuracy –Client-Server configuration –Minimal memory requirement for client nodes

Client-Server Configuration Server distributes sub-matrices to clients Clients communicate partial solutions Minimal memory requirements for clients

Experimental Results DesignTotalElements Sim. Length Runtime 1-proc2-proc4-proc ASIC1.2M8ns12.2h7.0h (1.7X) 5.1h (2.4X) 38IO SSO1.4M30ns3.0h2.0h (1.5X) 1.4h (2.2X) Signal-power2.1M1.2us13d7d18h (1.7X) 5d12h (2.4X) 4096x8 RAM (extracted) 2.3M10ns32h18.5h (1.7X) 13.4h (2.4X) 120IO SSO3.5M30ns6.2h4.1h (1.5X) 3.1h (2.0X)

ADI: Previous Works 1999, Namiki and Ito –the alternating direction implicit (ADI) is used to simulate a 2D TE wave. 2001, Zheng etc. –extend to 3D problem 2001 & 2003, Lee and Chen –ADI is used to transmission line modeled power grid The alternation is among different geometric directions, so the simulated geometric structure is constrained.

Alternating Direction Implicit (ADI) ADI Integration Method –Two way partition of the circuit –One partition is used for each backward integration –Unconditional stable (A-stable: independent of time step size) –Time step size according to local truncation error.

Alternating Direction Implicit (ADI) ADI method formulation Circuit partition algorithm Local truncation error estimation Stability discussion Experimental results

SPICE Formulation Equations for RLC circuits where C: capacitance matrix L: inductance matrix R: resistance matrix G: conductance matrix E: incidence matrix

ADI Formulation Transient simulation –Split the resistors and inductors branches into two parts G = G1 + G2 E = E1 + E2 R = R1 + R2 –Alternate Backward and Forward integration on each partition

ADI Formulation (Cont.) Equations of ADI method –the size of left-hand-side matrix remains unchanged –the number of non-zero elements is decreased –direct solving methods can be efficient

Experiments of non-zero fill-ins A small ASIC Design Spice matrix : Dimension: 10,286 The number of non-zero elements: 46,655 The number of non-zero fill-ins: 90,960 A large I/O Design Spice matrix : Dimension: 615,436 The number of non-zero elements: 2,126,246 Sub-matrix1Sub-matrix2Total # non-zero fill-ins # non-zero elements # non-zero fill-ins # non-zero elements # non-zero fill-ins Case 138,5722,61842,02010,04012,658 Case 21,176,20812,421,534950,03814,772,06827,193,602

Local Truncation Error (LTE) Time step control using LTE –In circuit transient analysis, the next time step can be estimated from the local truncation error at the present time point –LTE is defined as the difference between the calculated solution and the exact solution –To ensure the consistency, the local truncation error should not exceed the error tolerance, thus the time step can be estimated using

Local Truncation Error (Cont.) LTE of ADI method (1) equations let,, and then

Local Truncation Error (Cont.) LTE of ADI method (2) Estimate exact solution we characterize the input as a simple ramp over the interval (t n, t n+1 ), the exact analytic solution with time step  t n:

Local Truncation Error (Cont.) LTE of ADI method (3) Estimate ADI solution

Local Truncation Error (Cont.) LTE of ADI method (3) Estimate ADI solution

Local Truncation Error (Cont.) LTE of ADI method (4) LTE estimation

Local Truncation Error (Cont.) LTE of ADI method (5) Time step control

Local Truncation Error (Cont.) LTE of ADI method (5) Time step control

Stability Discussion The stability is concerned with whether the accumulated error grows or decays as time evolves through a series of time steps. One-step integration approximations, the error is accumulated by a factor of If the final steady state error vector is smaller than the initial, then the integration method is stable. In ADI integration method: –It can be proved to be unconditional stable

Experimental Results Circuit1Cuicuit2Circuit31k-cell #Nodes10,00040,00090,00010,200 #Transistors0006,500 Period10ns SPICE3CPU time (sec) , #steps ADICPU time (sec) #steps Speedup2.7x4.1x11.1x-

Voltage drop of Circuit3 (power mesh with sinks)

Signal in 1k_cell (ASIC design)

General Operator Splitting General operator splitting method –Multiple way partitions –Each partition is considered separately in each time step simulation –No geometry constrains –Local truncation error is used to dynamically control time step size

General Operator Splitting Fundamental theory Operator splitting formulation Local truncation error estimation Stability discussion Experimental results

Fundamental theory In circuit transient simulation, the integration approximation is actually the approximation of the exponential operator The exponential operators can be approximated in any order using a general scheme of fractal decomposition The decomposition of exponential operators corresponds to the circuit multi-way partition  New integration approximation in transient simulation

Fundamental theory Approximation of exponential operator –General circuit equation and solution –If we characterize the input as a simple ramp over the interval (t n, t n+1 ), the exact analytic solution with time step  t n –Exponential operator approximation Forward Euler Backward Euler Trapezoidal

Fundamental theory Decomposition of exponential operators (Masuo Suzuki, 1991, Physics) –Function –First order: –Second order: –Third order: –(2m-1)th and (2m)th order:

Fundamental theory Decomposition of exponential operators

General Operator Splitting Formulation Transient simulation: –Apply the second order approximation –In each time step, every partition is calculated separately and trapezoidal integration is used for every partition –The size of left-hand-side matrix may be changed –The number of non-zero elements is definitely decreased –Can be easily extended to multi-way partitions

General Operator Splitting Formulation Equations

Local Truncation Error (Cont.) LTE of general operator splitting method Estimate solution

Local Truncation Error (Cont.) LTE of general operator splitting method Estimate solution

Local Truncation Error (Cont.) LTE of general operator splitting method LTE estimation

Local Truncation Error (Cont.) LTE of general operator splitting method LTE estimation

Local Truncation Error (Cont.) LTE of general operator splitting method LTE estimation

Stability Discussion The trapezoidal integration method is unconditional stable for stable system. In our operator splitting method, trapezoidal method is used for all the sub-systems still unconditional stable

Experimental Results Circuit1Cuicuit2Circuit3 #Nodes10,00040,00090,000 #Transistors000 Period10ns SPICE3CPU time (sec) ,061.1 #steps GOSCPU time (sec) #steps102 Comparison2.1x2x1.1x

Voltage drop of Circuit3 (power mesh with sinks)

Conclusions We investigate alternating direction implicit and general operator splitting integration methods for transistor-level circuit transient simulation. In both methods, the circuit will be divided into several sub-circuits, thus the direct matrix solver is still efficient because the matrix is simplified. Both methods are second order accurate and unconditional stable. Overhead: –Circuit partition –Each time step consists of many sub-steps, each sub-step is a N-R iteration process Better for circuits with large linear network

Distributed Processors –Cluster –Supercomputer –Multi-Core Processors (Intel Dual/Quad-Core, IBM Cell etc.) Standard –MPI –Partitioning –Matrix Solver Capabilities –Speed-up ( ) –Memory Capacity ( ) Distributed Computing

Future Works ADI method –More experiments General operator splitting method –Design and implement multi-way circuit partition algorithm –Implement multi-way general operator splitting program –Derive LTE for general multi-way situation –More experiments Distributed Computing –MPI Standard –Distributed Partitioning, Matrix Solver