TBS: Fast Analysis of Structured Power Grid by Triangularization Based Structure Preserving Model Order Reduction Hao Yu, Yiyu Shi and Lei He Electrical.

Slides:

Advertisements

Similar presentations

CSE245: Computer-Aided Circuit Simulation and Verification Lecture Note 4 Model Order Reduction (2) Spring 2010 Prof. Chung-Kuan Cheng 1.

Advertisements

Thermal Via Allocation for 3D ICs Considering Temporally and Spatially Variant Thermal Power Hao Yu, Yiyu Shi and Lei He Electrical Engineering Dept. UCLA,

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.

Algebraic, transcendental (i.e., involving trigonometric and exponential functions), ordinary differential equations, or partial differential equations...

Point-wise Discretization Errors in Boundary Element Method for Elasticity Problem Bart F. Zalewski Case Western Reserve University Robert L. Mullen Case.

B.Macukow 1 Lecture 12 Neural Networks. B.Macukow 2 Neural Networks for Matrix Algebra Problems.

Computer Science & Engineering Department University of California, San Diego SPICE Diego A Transistor Level Full System Simulator Chung-Kuan Cheng May.

Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.

Minimal Skew Clock Embedding Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.

The continuous scaling trends of smaller devices, higher operating frequencies, lower power supply voltages, and more functionalities for integrated circuits.

1 Accurate Power Grid Analysis with Behavioral Transistor Network Modeling Anand Ramalingam, Giri V. Devarayanadurg, David Z. Pan The University of Texas.

A Fast Block Structure Preserving Model Order Reduction for Inverse Inductance Circuits Hao Yu, Yiyu Shi, Lei He Electrical Engineering Dept. UCLA David.

CSE245: Computer-Aided Circuit Simulation and Verification Lecture Note 3 Model Order Reduction (1) Spring 2008 Prof. Chung-Kuan Cheng.

CSE245: Computer-Aided Circuit Simulation and Verification Lecture Notes 3 Model Order Reduction (1) Spring 2008 Prof. Chung-Kuan Cheng.

Weiping Shi Department of Computer Science University of North Texas HiCap: A Fast Hierarchical Algorithm for 3D Capacitance Extraction.

1 BSMOR: Block Structure-preserving Model Order Reduction http//:eda.ee.ucla.edu Hao Yu, Lei He Electrical Engineering Dept., UCLA Sheldon S.D. Tan Electrical.

Exploiting Symmetry in SAT-Based Boolean Matching for Heterogeneous FPGA Technology Mapping Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1 Electrical.

Primary Contributions  Derive inversion based VPEC (Vector Potential Equivalent Circuit) model from first principles.  Replace inductances with effective.

UCSD CSE 245 Notes – SPRING 2006 CSE245: Computer-Aided Circuit Simulation and Verification Lecture Notes 3 Model Order Reduction (1) Spring 2006 Prof.

Off-chip Decoupling Capacitor Allocation for Chip Package Co-Design Hao Yu Berkeley Design Chunta Chu and Lei He EE Department.

UCSD CSE 245 Notes – SPRING 2006 CSE245: Computer-Aided Circuit Simulation and Verification Lecture Notes 4 Model Order Reduction (2) Spring 2006 Prof.

 Y. Hu, V. Shih, R. Majumdar and L. He, “Exploiting Symmetries to Speedup SAT-based Boolean Matching for Logic Synthesis of FPGAs”, TCAD  Y. Hu,

SAMSON: A Generalized Second-order Arnoldi Method for Reducing Multiple Source Linear Network with Susceptance Yiyu Shi, Hao Yu and Lei He EE Department,

MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 6. Eigenvalue problems.

More Realistic Power Grid Verification Based on Hierarchical Current and Power constraints 2 Chung-Kuan Cheng, 2 Peng Du, 2 Andrew B. Kahng, 1 Grantham.

Worst-Case Timing Jitter and Amplitude Noise in Differential Signaling Wei Yao, Yiyu Shi, Lei He, Sudhakar Pamarti, and Yu Hu Electrical Engineering Dept.,

1 Introduction to Model Order Reduction Luca Daniel Massachusetts Institute of Technology

Algorithms for a large sparse nonlinear eigenvalue problem Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.

Fast Low-Frequency Impedance Extraction using a Volumetric 3D Integral Formulation A.MAFFUCCI, A. TAMBURRINO, S. VENTRE, F. VILLONE EURATOM/ENEA/CREATE.

Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan.

PiCAP: A Parallel and Incremental Capacitance Extraction Considering Stochastic Process Variation Fang Gong 1, Hao Yu 2, and Lei He 1 1 Electrical Engineering.

1 Integer transform Wen - Chih Hong Graduate Institute of Communication Engineering National Taiwan University, Taipei,

Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.

Using Partitioning in the Numerical Treatment of ODE Systems with Applications to Atmospheric Modelling Zahari Zlatev National Environmental Research Institute.

Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.

Efficient Integration of Large Stiff Systems of ODEs Using Exponential Integrators M. Tokman, M. Tokman, University of California, Merced 2 hrs 1.5 hrs.

Scalable Symbolic Model Order Reduction Yiyu Shi*, Lei He* and C. J. Richard Shi + *Electrical Engineering Department, UCLA + Electrical Engineering Department,

A more reliable reduction algorithm for behavioral model extraction Dmitry Vasilyev, Jacob White Massachusetts Institute of Technology.

Decentralized Model Order Reduction of Linear Networks with Massive Ports Boyuan Yan, Lingfei Zhou, Sheldon X.-D. Tan, Jie Chen University of California,

 6.2 Pivoting Strategies 1/17 Chapter 6 Direct Methods for Solving Linear Systems -- Pivoting Strategies Example: Solve the linear system using 4-digit.

Lecture 7 - Systems of Equations CVEN 302 June 17, 2002.

Elliptic PDEs and the Finite Difference Method

On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.

1 A Fast Algorithm for Power Grid Design Jaskirat Singh Sachin Sapatnekar Department of Electrical and Computer Engineering University of Minnesota.

A TBR-based Trajectory Piecewise-Linear Algorithm for Generating Accurate Low-order Models for Nonlinear Analog Circuits and MEMS Dmitry Vasilyev, Michał.

1 Chapter 5: Harmonic Analysis in Frequency and Time Domains Contributors: A. Medina, N. R. Watson, P. Ribeiro, and C. Hatziadoniu Organized by Task Force.

Distributed Computation: Circuit Simulation CK Cheng UC San Diego

Xuanxing Xiong and Jia Wang Electrical and Computer Engineering Illinois Institute of Technology Chicago, Illinois, United States November, 2011 Vectorless.

QuickYield: An Efficient Global-Search Based Parametric Yield Estimation with Performance Constraints Fang Gong 1, Hao Yu 2, Yiyu Shi 1, Daesoo Kim 1,

Direct Methods for Sparse Linear Systems Lecture 4 Alessandra Nardi Thanks to Prof. Jacob White, Suvranu De, Deepak Ramaswamy, Michal Rewienski, and Karen.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems

Chapter 2 Interconnect Analysis Prof. Lei He Electrical Engineering Department University of California, Los Angeles URL: eda.ee.ucla.edu

In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.

Finite Element Modelling of Photonic Crystals Ben Hiett J Generowicz, M Molinari, D Beckett, KS Thomas, GJ Parker and SJ Cox High Performance Computing.

Inductance Screening and Inductance Matrix Sparsification 1.

Gaoal of Chapter 2 To develop direct or iterative methods to solve linear systems Useful Words upper/lower triangular; back/forward substitution; coefficient;

1 Presented by: Paul Mesa Vikram Rao Electrical Engineering Dept. UCLA Inverse Inductance and VPEC Modeling.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems

EASTERN MEDITERRANEAN UNIVERSITY EE 529 Circuits and Systems Analysis Mustafa Kemal Uyguroğlu.

DAC, July 2006 Model Order Reduction of Linear Networks with Massive Ports via Frequency-Dependent Port Packing Peng Li and Weiping Shi Department of ECE.

Chapter 2 Interconnect Analysis

Yiyu Shi Electrical Engineering Dept. UCLA http//:eda.ee.ucla.edu

CSE245: Computer-Aided Circuit Simulation and Verification

Yiyu Shi*, Wei Yao*, Jinjun Xiong+ and Lei He*

Inductance Screening and Inductance Matrix Sparsification

Simultaneous Power and Thermal Integrity Driven Via Stapling in 3D ICs

Simultaneous Power and Thermal Integrity Driven Via Stapling in 3D ICs

Ax = b Methods for Solution of the System of Equations (ReCap):

Presentation transcript:

TBS: Fast Analysis of Structured Power Grid by Triangularization Based Structure Preserving Model Order Reduction Hao Yu, Yiyu Shi and Lei He Electrical Engineering Dept. UCLA Partially supported by NSF and UC-MICRO fund sponsored by Analog Devices, Intel and Mindspeed http//:eda.ee.ucla.edu

2 New Challenges in Integrity Verification n Integrity verification is to check transient V/T-violation for linear power/signal/thermal network l Large-scale u millions of nodes and ports l Often structured u e.g., locally regular and globally irregular P/G network [Singh- Sapatnekar:TCAD’05] n A fast yet accurate linear simulator to perform large- scale transient verification is necessary l Linear-network macromodeling is one effective approach How to use structure information to build accurate and efficient macromodels

3 Existing Structured Macromodeling n Hierarchical node-elimination (HNE) by [Zhao-Panda- Sapatnekar-Blaauw:DAC’00] l Build macromodel by internal node elimination with source mapping l Analyze macromodel in a hierarchical (two-level) fashion l Require a sparsification by linear-programming (LP) due to the dense fill-in n SPRIM [Freund:ICCAD’04] and BSMOR [Yu-He-Tan:BMAS’05] l Leverage block structure in the state matrix l Build macromodel by a structure-preserved moment-matching n HiPRIME [Cao-Lee-Chen: DAC’02], a hierarchical extension of PRIMA [Odabaisoglu-Celik-Pileggi:TCAD’98] l Build macromodel by hierarchical orthonormalization l Lose the hierarchy due to the final flat-projection We propose a new structure-preserved moment matching, with 20x less waveform error and 50x speedup

4 Outline n Review macromodeling by moment matching n Our Approach: TBS method n Experimental Results n Conclusions

5 Macromodeling by Moment Matching (I) n Electric systems can be described in MNA (modified nodal analysis) Solution ( x ) of MNA is contained in block Krylov subspace n Grimme’s Projection Theorem

6 Macromodeling by Moment Matching (II) a) To remove linear dependency in the low-dimensioned projection matrix V, block-Arnoldi orthnormalization is applied c) To handle large number of inputs such as P/G network, SIMO (single-input-multi-output) reduction can be assumed b) To preserve passivity, a congruence transformation is used to project state matrices ( G,C,B,L ) respectively Replace the input port matrix B by a common input vector J l All poles are matched w.r.t. one superposed input Matched moments/poles ( q ) are independent on input number ( p ) Feldmann-Liu: ICCAD’04 V is flat and destroys the structure of state matrices [Feldmann-Liu: ICCAD’04]

7 Structure-preserved Moment Matching n Limitations of SPRIM and BSMOR Moment/pole matching is not localized l Reduction does not preserve the structure of latency l Model does not leverage redundancy l Inefficient and inaccurate for P/G grid macromodeling SPRIM [Freund:ICCAD’04] leverages the 2 x 2 block structure in MNA Splits V into a 2 x 2 block diagonal form l Preserves the structure of reciprocity (symmetry between input and output), and hence achieves a higher accuracy than PRIMA n BSMOR [Yu-He-Tan:BMAS’05] partitions state matrices into more blocks Splits V into a m x m block diagonal form l Preserves the block structure and sparsity, and hence achieves better efficiency than SPRIM

8 Outline n Review macromodeling by moment matching n Our Approach: TBS method l Triangular Block Structured moment matching n Experimental Results n Conclusions

9 l Stamp interconnection blocks off-diagonally l Stamp basic blocks diagonally From Layout to Structured Model n Build a structured state matrix by partitioning the layout g-g-g -g 1 g 3 -g-gxgx -g 1 2g 1 -g1g1 -g-g1g1 g3g3 -g -g -g g4g4 -g-g -g 2 2g 2 -g -g 2 g 4 -g -g-g g 1 -g1g1 -g1g1 --g1g g1g1 --gxgx -gxgx -gxgx -g2g2 -g2g2 - -g2g2 - -g2g2 -g2g2 -g2g2 2g w1w1 w2w2 g 3 =2g 1 +g x g 4 =2g 2 +g x n A number of interconnected basic blocks can be used to represent both homogenous and heterogeneous circuits g1g1 g2g2 gxgx

10 Properties of Interconnected Basic Blocks n Structure of latency : the spatial distribution of time constants l Each basic block has a time constant Due to redundancy, basic block representation is not compact n Redundancy : different basic blocks can share a same or similar time constant

11 Dominant-pole Based Clustering removes redundancy TBS Flow (Reduced Blocks) (Basic Blocks) Block Diagonal Projection (Block Integrity) Two-level Relaxation Analysis (Triangular Blocks) Triangularization (Compact Blocks) Dominant-pole Clustering

12 Clustering Procedure n Compress basic blocks into compact blocks n Cluster number is determined by the nature of the network structure l There is no need to cluster a homogeneous circuit, but TBS still applies 2. Cluster basic blocks if the mode-distance is small enough 1. Calculate the q -dominant pole-set (mode) for each basic block and

13 Advantages of Clustering n Redundant poles are removed l Hence redundant columns in the projection matrix are also removed, i.e., the effective rank of projection matrix is improved n Structure of latency is leveraged l Each compact block can be solved with different time-step n A complete modal decomposition is achieved l Each compact block has a unique pole-set or mode, and the resulted system is block-wisely stiff System poles are determined by both diagonal and off- diagonal blocks, which is not efficient

14 TBS Flow Triangularization can localize system poles to diagonal blocks, which is the key contribution of this work (Reduced Blocks) (Basic Blocks) Block Diagonal Projection (Block Integrity) Two-level Relaxation Analysis (Triangular Blocks) Triangularization (Compact Blocks) Dominant-pole Clustering

15 Triangularization Procedure 2. Move the original lower-triangular parts to the new upper-triangular parts 1. Stack a replica-block diagonally n This procedure is implemented by a block matrix data structure without increasing memory usage

16 Advantages of Triangularization n System poles are determined only by those compact blocks in diagonal l Compact blocks are almost decoupled from each other n A triangular system has a factorization cost only coming from those diagonal blocks l There is no need to factorize the entire matrix n Block duplication results in an equivalent solution l Simpler than the existing permutation based triangularization procedure [Kim Davis: KLU] Due to the replica block, the overall cost of factorization is the same as the original

17 TBS Flow Block diagonal projection can reduce the system size and the cost of the factorization (Reduced Blocks) (Basic Blocks) Block Diagonal Projection (Block Integrity) Two-level Relaxation Analysis (Triangular Blocks) Triangularization (Compact Blocks) Dominant-pole Clustering

18 2. Reduce the state matrices block by block respectively Block Diagonal Projection Procedure 1. Split a flat into a structured with an increased rank by a factor of cluster number n The reduced system preserves upper-triangular structure

19 Advantages of Block Diagonal Projection n System moments and poles are matched locally Each compact block is reduced locally to match q poles Total mq poles are matched for m unique compact blocks (poles from the replica are duplicate poles) n Reduced model preserves block triangular structure and structure of latency l Each reduced block can be factorized independently l Each reduced block could have different time-constant n More matched poles improves accuracy l Using a low-order reduction for each compact block locally can achieve a high-order accuracy for the overall system It can be efficiently solved by a block backward-substitution or a two-level analysis with relaxation

20 TBS Flow Two-level relaxation can further reduce simulation cost Reduced Blocks Basic Blocks Block Diagonal Projection Block Integrity Two-level Relaxation Analysis Triangular Blocks Triangularization Compact Blocks Dominant-pole Clustering

21 Two-level Relaxation Solver n The time-domain iteration of a triangular system always converges [White: Book’87] n Two-level representation and analysis + n Each reduced diagonal block can be factorized independently, and solved with different time step during backward-Euler (BE) integration l In contrast, the previous pole-residue solution u eigen-decompose the entire reduced matrix (dense and no structure) u structure of latency cannot be explored

22 Outline n Review macromodeling by moment matching n Our Approach: TBS method n Experimental Results n Conclusions

23 Experiment Settings n Large-scale homogeneous and heterogeneous P/G grid (RC-mesh) with millions of nodes n For heterogeneous case, each block has different wire-pitch/width, block-size and hence different time-constant n Reduction algorithm assumes SIMO reduction for large number of inputs but also supports the general MIMO reduction n Compare TBS to BSMOR [Yu-He-Tan:BMAS’05], HiPRIME [Cao-Lee-Chen:DAC’02], and HNE [Zhao-Panda-Sapatnekar-Blaauw:DAC’00]

24 Triangular Block Structure Preservation n Nonzero (nz) pattern of conductance matrices l (a) original system l (b) triangular system l (c) reduced system by TBS

25 m x q Pole Matching (m0=32, m=4, q=8 ): TBS has exact 32 -pole matched, BSMOR has exact 8 -pole matched and 24 -pole approximately matched, and HiPRIME (a partitioned PRIMA) has only 8 -pole matched n Waveforms in time domain: improved accuracy with more matched poles

26 Study Waveform-error Scalability ckt Node (N)Port (p)Order (q)HNEHiPRIMEBSMORTBS ckt11K e-69.09e-64.87e-65.03e-7 ckt210K e-52.31e-57.93e-61.84e-6 ckt3100K e-26.82e-41.91e-43.02e-5 ckt41M e-29.67e-34.23e-31.27e-4 ckt57.68M ,93e-25.10e-23.01e-3 ckt67.68M6.14M300NA 5.04e-3 n HiPRIME, BSMOR and TBS use the same order (moments) to generate the macromodel n The macromodel obtained by HNE has a similar size and sparsity as TBS 1. TBS reduces waveform-error by 38X compared to HNE as truncation used in HNE leads to large error 2. TBS reduces waveform-error by 33X compared to HiPRIME as more poles are matched 3. TBS reduces waveform-error by 17X compared to BSMOR as more poles are exactly matched

27 Study Runtime Scalability 1day:1hr:2 9min 6min:16sNA ckt6 1day:18min2min:8s1day:1hr: 36min 1hr:45m in ~5day2min:42s1day:5hr:1 1min 4hr:43min:18sckt5 11min:23s20.7s11min:42s4min:54 s ~1day47.3s21min:32s34min:58sckt4 1min:32s1.62s1min:38s1min:2s2hr:48min :20s 5.76s1min:51s1min:17sckt3 1.02s0.11s1.18s0.63s1min:42s0.54s1.24s2.19sckt2 0.08s0.09s0.08s0.12s1.02s0.15s0.08s0.44sckt1 simulationbuildsimulationbuildsimulationbuildsimulationbuild TBSBSMORHiPRIMEHNE ckt n All methods generate macromodels with similar accuracy 1. TBS (and HiPRIME) is 133X faster to build than HNE as no LP-truncation is needed to preserve sparsity 2. TBS (and HiPRIME) is 54X faster to build than BSMOR as the orthonormalization is performed locally 3. TBS (and BSMOR/HNE) is 109X faster to simulate than HiPRIME as their macromodels have hierarchy n Runtime includes macromodel-building/simulation time

28 Conclusions n TBS enables localized moment matching, and matches more poles than PRIMA n TBS is stable, and is passive for MIMO reduction n TBS is applicable to both homogenous and heterogeneous designs n TBS achieves over 20x less waveform error and 50x speedup compared to HNE, HiPRIME, and BSMOR (an improved version of SPRIM) n TBS approach has been extended to l Handle inductance and its inverse element [Yu-Shi-He:ICCAD’06] l Optimize simultaneous power and thermal integrity in 3D integration [Yu-Ho-He:ICCAD’06] More details can be found in DAC Ph. D forum 2006