Circuit Simulation via Matrix Exponential Method Speaker: Shih-Hung Weng Adviser: Chung-Kuan Cheng Date: 05/31/2013 1
Foundation of Design Flow 2 PlacementLogic Synthesis Timing Analysis Routing ………… Circuit Simulation lookup table characterization Abstraction Layer Circuit Simulation
Emerging Demands Full system verification and analysis – scalability and performance 3 time voltage on-chip power grid low frequency
Publications (1/3) Circuit Simulation with Matrix Exponential Method: 1.S.-H. Weng, H. Zhuang and C.K. Cheng, “Adaptive Time Stepping for Power Grid Simulation using Matrix Exponential Method”, submitted to IEEE ICCAD S.-H. Weng, Q. Chen and C.K. Cheng, “Circuit Simulation using Matrix Exponential Method for Stiffness Handling and Parallel Processing”, IEEE ICCAD, Nov Q. Chen, W. Schoenmaker, S.-H. Weng, C.K. Cheng, G.-H. Chen, L.-J. Jiang and N. Wong, “A Fast Time- Domain EM-TCAD Coupled Simulation Framework via Matrix Exponential,” IEEE ICCAD, Nov (Best Paper Award Candidate) 4.Y. Li, Q. Cheng, S.-H. Weng, C.K. Cheng and N. Wong, “Globally Stable, Highly Parallelizable Fast Transient Circuit Simulation via Faber Series”, IEEE NewCAS May S.-H. Weng, Q. Chen and C.K. Cheng, “Time-Domain Analysis of Large-Scale Circuits by Matrix Exponential Method with Adaptive Control”, IEEE Trans. on CAD, Jul Q. Chen, S.-H. Weng and C.K. Cheng, “A Practical Regularization Technique for Modified Nodal Analysis in Large-Scale Time-Domain Circuit Simulation”, IEEE Trans. on CAD, Jun S.-H. Weng, Q. Chen and C.K. Cheng, “Circuit Simulation by Matrix Exponential Method,” IEEE ASIC Conference, Oct S.-H. Weng, P. Du and C.K. Cheng, “A Fast and Stable Explicit Integration Method by Matrix Exponential Operator for Large Scale Circuit Simulation”, IEEE ISCAS, May
Publications (2/3) Clock Gating Synthesis: 9.S.-H Weng, Y.-M. Kuo and S.-C. Chang, “Timing Optimization in Sequential Circuit by Exploiting Clock-Gating Logic,” ACM Trans. on DAES, April Y.-M. Kuo, S.-H. Weng, and S.-C. Chang, “A Novel Sequential Circuit Optimization with Clock Gating Logic,” IEEE ICCAD, Nov High-speed Interconnect: 11.G. Sun, S.-H. Weng, C.K, Cheng, B. Lin and L. Zeng, “An On-Chip Global Broadcast Network Design with Equalized Transmission Lines in the 1024-Core Era”, IEEE SLIP Jun S.-H. Weng, Y. Zhang, J. F. Buckwalter and C.K. Cheng, “Energy Efficiency Optimization through Co- Design of the Transmitter and Receiver in High-Speed On-Chip Interconnects”, accepted by IEEE Trans. on VLSI Placement and Routing: 13.C.K. Cheng, P. Du, A.B. Kahng and S.-H. Weng, “Low-Power Gated Bus Synthesis for 3D IC via Rectilinear Shortest-path Steiner Graph,” IEEE ISPD, Mar., P. Du, W. Zhao, S.H. Weng, C.K. Cheng and R.L. Graham, “Character Design and Stamp Algorithms for Character Projection Electron-Beam Lithography,” IEEE ASPDAC, Feb.,
Publications (3/3) Power Grid Analysis: 15.X. Hu, P. Du, S.-H. Weng and C.K. Cheng, “Worst-Case Noise Prediction With Non-zero Current Transition Times for Power Grid Planning,” accepted by IEEE Trans. on VLSI. 16.C.-C. Chou, H.-H. Chuang, T.-L. Wu, S.-H. Weng, and C.K. Cheng, “Eye Prediction of Digital Driver with Power Distribution Network Noise,” IEEE EPEPS, Nov (Best Student Paper Award) 17.P. Du, S.-H. Weng, X. Hu and C.K. Cheng, “Power Grid Sizing via Convex Programming,” IEEE ASIC Conference, Oct P. Du, X. Hu, S.H. Weng, A. Shayan, X. Chen, A. E. Engin and C.K. Cheng, “Worst-Case Noise Prediction with Non-zero Current Transition Times for Early Power Distribution System Verification,” IEEE ISQED, Mar S.-H. Weng, Y.-M. Kuo, S.-C. Chang, and M. Marek-Sadowska, “Timing Analysis Considering IR Drop Waveforms in Power Gating Designs,” IEEE ICCD, Oct
Circuit Formulation Formulated as a system of DAEs [Ho et. al. ‘75] 8 resistance & incidence capacitance & inductance branch currents & nodal voltages derivative of charges in nonlinear devices input sources currents of nonlinear devices linearized by compact model (BSIM, PSP, etc.)
Circuit Formulation Formulated as a system of DAEs [Ho et. al. ‘75] Solve x(t) in implicit or explicit numerical method 9 after linearization
10 forward Euler backward Euler Numerical Integration (1/2) Forward Euler (1 st order explicit) Backward Euler (1 st order implicit) Stability issue for stiff circuit unstable result performance & scalability issues sparse matrix-vector product solving a linear system
Methods LinearNonlinear HighMildLowHighMildLow Forward Eulerslowfastslowfast Backward Eulermedium slow Trapezoidal> Backward Euler and beyond? fast Numerical Integration (2/2) 11 MethodsComputationScalabilityErrorStabilityStep size Forward Eulerx=AvhighO(h 2 )lowtiny Backward EulerAx=blowO(h 2 )A-stablemedium TrapezoidalAx=blowO(h 3 )A-stable> Backward Euler and beyond?simplehighO(h n )highlarge stiffness lots Ax=b one Ax=b with fixed step size in C/h+G Performance = # steps x computation per step circuit dependent more #steps
Matrix Exponential Method (1/2) Analytical solution of – Let A=-C -1 G, b=C -1 u (C can be regularized [TCAD ‘12]) Let input be piecewise linear 13
Matrix Exponential Method (2/2) One-exponential formulation [Al-Mohy&Higham ‘11] – reduce three matrix exponential to one 14 where
Advantages Accuracy: Analytical solution – Approximate e Ah as (I+Ah) Forward Euler – Approximate e Ah as (I-Ah) -1 Backward Euler Stability: A-stable for passive circuits 15 reference solution How to compute e A v?
Computation on Matrix Exponential 19 dubious ways [van Loan03] 16 Categories Based on Series Method Rational Approximation Decomposition Splitting Quadrature Rule Krylov Subspace eAeA eAveAv small large spec(A) regular basis and rational basis
Krylov Subspace Approximation (1/2) Krylov subspace K(A, v) = {v, Av, A 2 v, …, A m-1 v} – orthogonalized by Arnoldi process – approximate e Ah v by e Hmh – posteriori error estimation [Saad92] 18 {v, Av, A 2 v, …, A m-1 v} Arnoldi process sparse matrix-vector multiplication m is about 10~100 fast error estimation scaling invariant efficiency adaptivity
Stiffness affects step size and dimension – Arnoldi process captures extreme and clustered eigenvalues – Error bound [Saad92] Krylov Subspace Approximation (2/2) 19 Image{h } Real{h } highly stiff - max - min Image{h } Real{h } captured regions Arnoldi process with a small m critical part for e Ah shrink h or increase m for capturing critical eigenvalues where remedied by restarted scheme and scaling effect [ICCAD ‘12]
Rational basis (I- A) -1 – K((I- A) -1, v) = {v, (I- A) -1 v, …, (I- A) -m v} Rational Krylov Subspace Approximation (1/2) 21 ….. for j = 1, 2,..., m solve (I- A)w = v j for i = 1, 2,..., j H i,j = w T v i w = w − H i,j v i end H j+1,j = |w| 2 v j+1 = w/H j+1,j end Arnoldi process (C+ G)w=Cv j avoid regularization of C subspace for A one LU for linear circuit w=Av j
Rational basis (I- A) -1 – K((I- A) -1, v) = {v, (I- A) -1 v, …, (I- A) -m v} Approximation of e Ah v Posteriori error estimation [van den Eshof 06] Rational Krylov Subspace Approximation (1/2) 22 adaptivity
Spectral transformation – similar to preconditioning – relax stiffness constraint – enable large step size with less dimension ’ min ’ max small gap - max - min -h ’’ max -h ’’ min - ’’ max - ’’ min Rational Krylov Subspace Approximation (2/2) 23 Image{h } Real{h } transforming spectrum by (I- A) -1 captured by Arnoldi process critical part for e A projecting back to A by 1/ (I-H -1 ) applying large h to 1/ (I-H -1 ) small m is acceptable determined by within a unit circle
Spectral transformation – similar to preconditioning – relax stiffness constraint – enable large step size with less dimension Rational Krylov Subspace Approximation (2/2) 24 small step size fix , sweep m and h
Spectral transformation – similar to preconditioning – relax stiffness constraint – enable large step size with less dimension Rational Krylov Subspace Approximation (2/2) 25 = large error fix h, sweep m and
Methods LinearNonlinear HighMildLowHighMildLow Forward Eulerslowfastslowfast Backward Eulermedium slow Trapezoidal> Backward Euler Krylov Approx slowfastslowmedium Ration Krylov fastslow Wrap Up MethodsComputationScalabilityErrorStabilityStep size Forward Eulerx=AvhighO(h 2 )lowtiny Backward EulerAx=blowO(h 2 )A-stablemedium TrapezoidalAx=blowO(h 3 )A-stable> Backward Euler Krylov Approxx=AvhighO(h n )highmedium Ration KrylovAx=blowO(h n )highlarge 26
Parallelism in Krylov Subspace Arnoldi process – sparse matrix-vector multiplication [Bell&Garland ‘09] Exponential of a small matrix [Higham ‘05] – dense matrix by matrix operation 28 … thread 1 thread 2 thread n-1 thread n
t9 Constant slope within a step Input Grouping 29 input 1 input 2 time t1 t2 t3 t4 t5t6t7 t8 t10 t11 t12t13t14 t15 tiny steps due to maintaining constant slope
Constant slope within a step Input Grouping 30 group 1 group 2 time t1 t2 t3 t4 t5 t6 t7 t8 t1 t2 t3 t4 t5 t6 t7 t8 thread 1 thread 2
Settings of Experiments Environment – Implemented in Matlab – Intel i7 2.67GHz with 4GB memory Benchmarks – Nonlinear and large-scale circuits – Power distribution networks – IBM power grid testcases [Nassif 08] 32 DesignCategory# R# C# Trans.SizeStiffness D116bit adder x10 3 D2ALU13.6K4.3K650210K5.4x10 6 D3IO1.26M34.6K K1.6x10 6 D4Power grid10.4M8.6M012M2.6x10 5 generalized eigenvalues of (G, C)
Settings of Experiments Environment – Implemented in Matlab – Intel i7 2.67GHz with 4GB memory Benchmarks – Nonlinear and large-scale circuits – Power distribution networks – IBM power grid testcases [Nassif 08] 33 DesignArea (mm 2 )# R# C# LSizeStiffness P K15K 45.7K 8.7x10 9 P K228K 688K 8.3x10 9 P M0.97M 2.90M 1.0x10 10 P M2.47M 7.40M 1.0x10 10 RC tanks for PCB and package
Settings of Experiments Environment – Implemented in Matlab – Intel i7 2.67GHz with 4GB memory Benchmarks – Nonlinear and large-scale circuits – Power distribution networks – IBM power grid testcases [Nassif 08] 34 Design# R# C# L# I# VSizeStiffness ibmpg2t245K36K33036K330164K 3.5x10 12 ibmpg3t1.60M201K955201K9551M 3.4x10 11 ibmpg4t1.83M265K962266K9621.2M 2.5x10 11 ibmpg5t1.55M473K277473K539K2.1M 4.7x10 11 ibmpg6t2.41M761K281761K836K3.2M 3.8x10 11
Nonlinear and Large-scale Circuits Matrix exponential method (MEXP) – Krylov subspace approximation – Restarted scheme and parallel SpMV on GPU Trapezoidal method (TRAP) – same adaptive scheme as MEXP 35 DesignSizetimemTRAPMEXP-Krylovspeedup D ps s408.7s1.64X D210K100ps303,085.91s982.14s3.14X D3630K100ps308,053.45s535.92s15.05X D412M1ns20fails629.56n/a Parallel SpMV
Power Distribution Networks Simulate long time span (1μs) for step response One LU factorization – averaged by forward/backward substitutions MEXP with rational basis adaptively scales h/ TRAP uses predetermined step size 36 Design TRAP (h = 10ps) MEXP – Rational ( = ) LU(s)TotalLU(s)TotalSpeedup P m m15.73X P h m16.96X P h h17.91X P h h18.08X adaptive & large step size
Power Distribution Networks 37
IBM Testcases Widely adopted benchmarks Many input current sources Same MEXP with rational basis and TRAP 38 Design TRAP (h = 10ps) MEXP – Rational ( = ) LU(s)Total(s)LU(s)Total(s)Speedup ibmpg2t X ibmpg3t X ibmpg4t X ibmpg5t X ibmpg6t X ill alignment
IBM Testcases 39
Applying simple grouping – each group of inputs has the same pivot points – 6X speedup on average IBM Testcases 40 Design TRAP (h = 10ps) MEXP – Rational ( = ) LU(s)Total (s)# GroupLU (s)Total (s)Speedup ibmpg2t X ibmpg3t X ibmpg4t X ibmpg5t X ibmpg6t X
Conclusions Emerging challenges in the circuit simulation – scalability and performance Matrix exponential method – accuracy, adaptivity and stability – regular and rational Krylov subspace approximation Effectiveness of matrix exponential method – Simulate a large-scale circuit with 12M nodes – Nonlinear circuits: 6.61X speedup on average – Impulse response for PDNs: 15X speedup – IBM testcases: 6X speedup using input grouping 41
Future Works Variant basis in Krylov subspace – inverted, extended basis Model Order Reduction and matrix exponential method – both exploiting Krylov subspace – utilizing well-developed MOR to MEXP Hybrid simulation via matrix exponential – handle thermal, mechanical phenomena with FEM 42
Trade off between stability and performance SILCA [Li & Shi, ‘03]ACES [Devgan & Rohrer, ‘97] Where are we? 44 computational effort stability high low high Backward Euler Forward Euler Matrix Exponential Method [Weng et. al. ’11] Telescopic [Dong & Li, ‘10]Waveform Relaxation [E Lelarasmee et. al, ‘82]Domain Decomposition [K. Sun et. al., ‘07]LIM [J. E. Schutt-Aine, ‘01] Tailor for circuit simulation: Adaptive step control Scaling effect Nonlinear device Parallelization ETD in numerical community: [Saad ‘92] [Ban et. al. ‘11] [Aluffi-Pentini et. al. ‘03] [Hochbruck et. al. ‘97] Trapezoidal Method(SPICE)
Adaptive Step Control Typical circuit behavior 45 larger h smaller h error budget
Adaptive Step Size Strategy Adjustment of step size – Krylov subspace approximation require only to scale H m : α A → α H m re-calculate e Hm – backward Euler (C/h+G) changes and needs to solve linear system again Strategy: – maximize step size with a given error budget Err total – error are from Krylov space method and linearization 46
Nonlinear Formulation Decouple nonlinear and linear components 47 constant during Newton’s iterationcalculate Jacobian matrix J(F) in MEXP has less non-zeros approximate e A F MEXP: BE:
Rational basis A -1 – K( A -1, v) = {v, A -1 v, …, A -m v} – requires more m and smaller h Only Inverted 48 Image{h } Real{h } after shifted-and-inverted only inverted smaller spectrum -1/ min
Different 49 needs large m
Different 50
Spectral Transformation – h = 10p Small RC mesh, 100 by 100 Different h for Krylov subspace Different for rational Krylov subspace 51
Spectral Transformation – h = 10f Small RC mesh, 100 by 100 Different h for Krylov subspace Different for rational Krylov subspace 52
Spectral Transformation – = 10f Small RC mesh, 100 by 100 Different h for Krylov subspace Different for rational Krylov subspace 53
Spectral Transformation– = 1p Small RC mesh, 100 by 100 Different h for Krylov subspace Different for rational Krylov subspace 54
Spectral Transformation– = 100p Small RC mesh, 100 by 100 Different h for Krylov subspace Different for rational Krylov subspace 55
Sweep for Large Range 56
Sweep for Large Range 57
Difference Between Inverted and Rational 58
Fixed = 1p, sweep time step h 59
Fixed = 1n, sweep time step h 60
Fixed = 1u, sweep time step h 61
Fixed = 1m, sweep time step h 62
Fixed = 1, sweep time step h 63
Fixed = 1k, sweep time step h 64
Fixed = 1M, sweep time step h 65