Transient Analysis of Power System Chung-Kuan Cheng January 27, 2006 Computer Science & Engineering Department University of California, San Diego
Outline Statement of the Problem Status of Simulators Solver Engine: Multigrid Review Integration Method Frequency Domain Analysis Experimental Results Conclusion
Statement of Problem Huge RLC linear system with multiple sources Wide spread of natural frequencies (KHz-GHz) Many corners and many modes of operations Packaging and transmission lines Nonlinear devices
Status of the Simulator Matrix Solvers: Multigrid, Multigraph Integration Method: Operator Splitting Frequency Domain Analysis
Matrix Solvers Direct Method LU, KLU High Complexity Basic Iterative Slow Convergence Conjugate Gradient Multigrid Method MultiGrid, MultiGraph
Multigrid Review Error Components Basic Idea of Multigrid High frequency error (More oscillatory between neighboring nodes) Low frequency error (Smooth between neighboring nodes) Basic iterative methods only efficiently reduce high frequency error Basic Idea of Multigrid Convert hard-to-damp low frequency error to easy-to-damp high frequency error
Multigrid : A Hierarchy of Problems A2• X2=b2 2 Gauss Elimination 1 Interpolation A1 • X1=b1 Restriction 2 4 1 3 Smoothing Smoothing A0 • X0=b0 4 6 2 Interpolation 5 Restriction 1 3 Smoothing Smoothing Hierarchically, all error components smoothed efficiently
Geometric vs Algebraic Geometric multigrid method Require Regular Grid Structure Algebraic Multigrid Coarsening Relied on Matrix, No requirement of regular grid structure Coloring scheme Error Smoothing Operator: Gauss-Seidel Interpolation Small residue but the error decreases very slowly. In practice, we use only coarse node at the RHS of above formula to approximate error correction of fine node.
Convergence of Multigrid Method RLKC network System Equation: Apply Trapezoidal Rule: The LHS matrix is not S.P.D, but can be converted to S.P.D matrix The LHS matrix of first equation is now S.P.D. Similar for B.E and F.E L-1 is called K / Susceptance / Reluctance Matrix
Experimental Results (1) IBM Test Case Board / Packaging / Chip Power Network Fully coupled packaging inductance 60k elements, 5000 nodes. Spice failed Our tool Less than 10 minutes chip board Power Supply
Experimental Results (2) Power/Clock network case from OEA Tech. 30k nodes, 1000 transistor devices HSPICE/Spice run more than 2 days on our 400M Solaris machine (Sun Blade 100) Our Run time: 1.5 hour
Integration Method:Operator Splitting ADI: Two way partitions. Working on time steps. Operator Splitting: Multiway Partitioning. Research + Development.
Operator Splitting- A Simple Example Initial value problem (IVP) of ordinary differential equation (ODE) where L is a linear/nonlinear operator and can be written as a linear sum of m subfunctions of u Suppose are updating operators on u from time step n to n+1 for each of the subfunctions, the operator splitting method has the form of: The basic idea of Operator splitting can be explained by this simple IVP of ordinary differential equation.
General Operator Splitting on Circuit Simulation We generalize the operator splitting to graph based modeling No geometry or locality constrains Convergence A-stable: independent of time step size Consistence : local truncation error We generalize There is no geometriy The proposed method has guaranteed convergence, it is stable independent of time step size and is consistency at each time point.
Spice Formulation - Forward Euler Circuit Equation for RLC circuits: Forward Euler Formulation The splitting formulation is derived from the backward Euler formulation of the circuit equation. where
Spice Formulation – Backward Euler Circuit Equation for RLC circuits: Backward Euler Formulation The splitting formulation is derived from the backward Euler formulation of the circuit equation. where
Splitting Formulation Split the circuit resistor branches into two partitions, we have Each partition has a full-version of capacitors and inductors.
Splitting Formulation General Operator Splitting Iteration: Alternate Backward and forward integration on partitions
Circuit Splitting: Splitting Algorithm Objective Minimize the overall nonzero fill-ins Guarantee DC path for every nodes Hint: Tree structure generate no nonzero fill-ins in LU factorization. High Degree nodes generates more nonzero fill-ins during LU factorization Linear Circuits (talk about circuits with transistors later) Bipartition of neighbors for each node Basic idea In the LU decomposition process, non-zero fill-in will be introduced among neighbors of the pivot. Reduce the number of neighbors for all nodes will be beneficial to decrease the number of non-zero fill-ins. Avoid loop, make tree structure as much as possible Check DC path, reassign partition if necessary
Experimental Results-1 Power Network & Gate Sinks Examples Circuit1 Circuit2 Circuit3 Circuit4 # Nodes 11,203 41,321 92,360 160,657 # Transistors 74 513 1,108 2,130 Simulation Period 10ns SPICE3(sec) 602.44 8268.92 39612.32 N/A Operator Splitting 74.64 305.38 681.18 1356.21 Speedup 8.1x 27.1x 58.2x We test a number of RLC power networks with size ranging from 11k nodes to 160k nodes. Various transistor gates draw current from the power networks. Those power networks are approximately in mesh structures. The splitting algorithm results in very limited nonzero fill-ins and we observe linear runtime of the proposed method. The CPU runtime and speedup are given in this Table One or two orders of magnitude speedup against SPICE3 is obtained. Actually we observe linear speedup, This is because the power network is in mesh structure , the two partitions generate almost no nonzero fill-ins during LU decomposition. The transient waveform circuit3 is given in this figure . Voltage Drop of Circuit3
Experimental Results-2 RLC Power/Clock network case. 29110 nodes, 720 transistor devices Spice3 Runtime: 12015 sec. Our Run time: 649.5 sec. 18.5x The Power and clock network case contains a RLC power ground network and a two-level H-tree clock. Figure IV.6 shows the voltage drop at one node of the power network. Transient simulation of 10ns is completed in 649.5 seconds, which is 18.5 times faster than SPICE3 as shown in Table IV.1.
Experimental Result-3 Two 1K and 10K cell designs Bottle neck: Nonzero fill-ins Device Evaluation Time Two 1K and 10K cells ASIC designs are tested to demonstrate the proposed approach’s ability of handling transistor dominated nonlinear circuits. The 1k cell circuit has 10,200 nodes and 6,500 transistors. The 10k cell circuit has 123,600 nodes and 69,000 transistors. We assume that ideal power and ground supplies are provided in those RC examples. The proposed approach takes 415.9 seconds for 1K cell circuit and 3954.7 seconds for 10K cell circuit to finish 20ns transient simulations. The speedup over SPICE3 is 5.1x and 11.2x for these two examples (Table IV.1). We observe accurate waveform match for both examples. Figure IV.8 shows the transient waveform of one gate output in the 1K cell design. # of Nodes # of transistor Spice3 runtime(s) Our runtime (s) Speedup 1k Cell 10,200 6,500 2121 415.9(s) 5.1x 10k Cell 123,600 69,000 44293 3954.7(s) 11.2x
Large Power Ground Network 600,000 nodes Irregular RC network 10ns Transient Simulation: 4083sec This example contains a huge RC power network (0.6 million nodes) with irregular structure (some nodes have thousands of neighbors). The switching activities that draw current from the power network are modeled as piecewise linear current waveform. Berkeley SPICE3 fails to execute because of the memory size and computation time problem. The operator splitting approach finished the transient analysis of 10ns in just 4083 seconds. Figure IV.7 illustrates the voltage drop of a node on the power network.
Frequency Domain Analysis Natural Frequency Extraction Complex Matrix Solver Fast Fourier Transformation
Conclusion Transient of Power Analysis: Post Doc. (Sep 05-Feb 06) Nonlinear Network: Fastrack Release Operator Splitting: Rui Shi Packaging: Vincent Peng Frequency Domain Analysis: R. Wang
References Efficient Transistor Level Simulation Using Two-Stage Newton-Raphson and Multigrid Method,CK Cheng and Zhengyong Zhu, filed by UCSD, SD2005-013. Circuit Splitting in Analysis of Circuits at Transistor Level, C.K. Cheng, R. Shi, and Z. Zhu, filed by UCSD, SD2005-129-PCT, June 7, 2005. Z. Zhu, B. Yao, and C.K. Cheng, "Power Network Analysis Using an Adaptive Algebraic Multigrid Approach, ACM/IEEE Design Automation Conference, pp. 105-108, June 2003. Z. Zhu, K. Rouz, M. Borah, C.K. Cheng, and E.S. Kuh, "Efficient Transient Simulation for Transistor-Level Analysis,“ Asia and South Pacific Design Automation Conf., 240-243, 2005.