Download presentation
Presentation is loading. Please wait.
Published byMegan Terry Modified over 9 years ago
1
Circuit Simulation via Matrix Exponential Method Speaker: Shih-Hung Weng Adviser: Chung-Kuan Cheng Date: 05/31/2013 1
2
Foundation of Design Flow 2 PlacementLogic Synthesis Timing Analysis Routing ………… Circuit Simulation lookup table characterization Abstraction Layer Circuit Simulation
3
Emerging Demands Full system verification and analysis – scalability and performance 3 time voltage on-chip power grid low frequency
4
Publications (1/3) Circuit Simulation with Matrix Exponential Method: 1.S.-H. Weng, H. Zhuang and C.K. Cheng, “Adaptive Time Stepping for Power Grid Simulation using Matrix Exponential Method”, submitted to IEEE ICCAD 2013 2.S.-H. Weng, Q. Chen and C.K. Cheng, “Circuit Simulation using Matrix Exponential Method for Stiffness Handling and Parallel Processing”, IEEE ICCAD, Nov. 2012 3.Q. Chen, W. Schoenmaker, S.-H. Weng, C.K. Cheng, G.-H. Chen, L.-J. Jiang and N. Wong, “A Fast Time- Domain EM-TCAD Coupled Simulation Framework via Matrix Exponential,” IEEE ICCAD, Nov. 2012 (Best Paper Award Candidate) 4.Y. Li, Q. Cheng, S.-H. Weng, C.K. Cheng and N. Wong, “Globally Stable, Highly Parallelizable Fast Transient Circuit Simulation via Faber Series”, IEEE NewCAS May. 2012 5.S.-H. Weng, Q. Chen and C.K. Cheng, “Time-Domain Analysis of Large-Scale Circuits by Matrix Exponential Method with Adaptive Control”, IEEE Trans. on CAD, Jul. 2012 6.Q. Chen, S.-H. Weng and C.K. Cheng, “A Practical Regularization Technique for Modified Nodal Analysis in Large-Scale Time-Domain Circuit Simulation”, IEEE Trans. on CAD, Jun. 2012 7.S.-H. Weng, Q. Chen and C.K. Cheng, “Circuit Simulation by Matrix Exponential Method,” IEEE ASIC Conference, Oct. 2011 8.S.-H. Weng, P. Du and C.K. Cheng, “A Fast and Stable Explicit Integration Method by Matrix Exponential Operator for Large Scale Circuit Simulation”, IEEE ISCAS, May. 2011 4
5
Publications (2/3) Clock Gating Synthesis: 9.S.-H Weng, Y.-M. Kuo and S.-C. Chang, “Timing Optimization in Sequential Circuit by Exploiting Clock-Gating Logic,” ACM Trans. on DAES, April 2012. 10.Y.-M. Kuo, S.-H. Weng, and S.-C. Chang, “A Novel Sequential Circuit Optimization with Clock Gating Logic,” IEEE ICCAD, Nov. 2008 High-speed Interconnect: 11.G. Sun, S.-H. Weng, C.K, Cheng, B. Lin and L. Zeng, “An On-Chip Global Broadcast Network Design with Equalized Transmission Lines in the 1024-Core Era”, IEEE SLIP Jun. 2012 12.S.-H. Weng, Y. Zhang, J. F. Buckwalter and C.K. Cheng, “Energy Efficiency Optimization through Co- Design of the Transmitter and Receiver in High-Speed On-Chip Interconnects”, accepted by IEEE Trans. on VLSI Placement and Routing: 13.C.K. Cheng, P. Du, A.B. Kahng and S.-H. Weng, “Low-Power Gated Bus Synthesis for 3D IC via Rectilinear Shortest-path Steiner Graph,” IEEE ISPD, Mar., 2012 14.P. Du, W. Zhao, S.H. Weng, C.K. Cheng and R.L. Graham, “Character Design and Stamp Algorithms for Character Projection Electron-Beam Lithography,” IEEE ASPDAC, Feb., 2012 5
6
Publications (3/3) Power Grid Analysis: 15.X. Hu, P. Du, S.-H. Weng and C.K. Cheng, “Worst-Case Noise Prediction With Non-zero Current Transition Times for Power Grid Planning,” accepted by IEEE Trans. on VLSI. 16.C.-C. Chou, H.-H. Chuang, T.-L. Wu, S.-H. Weng, and C.K. Cheng, “Eye Prediction of Digital Driver with Power Distribution Network Noise,” IEEE EPEPS, Nov. 2012 (Best Student Paper Award) 17.P. Du, S.-H. Weng, X. Hu and C.K. Cheng, “Power Grid Sizing via Convex Programming,” IEEE ASIC Conference, Oct. 2011 18.P. Du, X. Hu, S.H. Weng, A. Shayan, X. Chen, A. E. Engin and C.K. Cheng, “Worst-Case Noise Prediction with Non-zero Current Transition Times for Early Power Distribution System Verification,” IEEE ISQED, Mar. 2010 19.S.-H. Weng, Y.-M. Kuo, S.-C. Chang, and M. Marek-Sadowska, “Timing Analysis Considering IR Drop Waveforms in Power Gating Designs,” IEEE ICCD, Oct. 2008 6
7
Outline Numerical Integration in Circuit Simulation Matrix Exponential Method – Krylov Subspace Approximation – Rational Krylov Subspace Approximation – Parallelism Experimental Results Conclusions 7
8
Circuit Formulation Formulated as a system of DAEs [Ho et. al. ‘75] 8 resistance & incidence capacitance & inductance branch currents & nodal voltages derivative of charges in nonlinear devices input sources currents of nonlinear devices linearized by compact model (BSIM, PSP, etc.)
9
Circuit Formulation Formulated as a system of DAEs [Ho et. al. ‘75] Solve x(t) in implicit or explicit numerical method 9 after linearization
10
10 forward Euler backward Euler Numerical Integration (1/2) Forward Euler (1 st order explicit) Backward Euler (1 st order implicit) Stability issue for stiff circuit unstable result performance & scalability issues sparse matrix-vector product solving a linear system
11
Methods LinearNonlinear HighMildLowHighMildLow Forward Eulerslowfastslowfast Backward Eulermedium slow Trapezoidal> Backward Euler and beyond? fast Numerical Integration (2/2) 11 MethodsComputationScalabilityErrorStabilityStep size Forward Eulerx=AvhighO(h 2 )lowtiny Backward EulerAx=blowO(h 2 )A-stablemedium TrapezoidalAx=blowO(h 3 )A-stable> Backward Euler and beyond?simplehighO(h n )highlarge stiffness lots Ax=b one Ax=b with fixed step size in C/h+G Performance = # steps x computation per step circuit dependent more #steps
12
Outline Numerical Integration in Circuit Simulation Matrix Exponential Method – Krylov Subspace Approximation – Rational Krylov Subspace Approximation – Parallelism Experimental Results Conclusions 12
13
Matrix Exponential Method (1/2) Analytical solution of – Let A=-C -1 G, b=C -1 u (C can be regularized [TCAD ‘12]) Let input be piecewise linear 13
14
Matrix Exponential Method (2/2) One-exponential formulation [Al-Mohy&Higham ‘11] – reduce three matrix exponential to one 14 where
15
Advantages Accuracy: Analytical solution – Approximate e Ah as (I+Ah) Forward Euler – Approximate e Ah as (I-Ah) -1 Backward Euler Stability: A-stable for passive circuits 15 reference solution How to compute e A v?
16
Computation on Matrix Exponential 19 dubious ways [van Loan03] 16 Categories Based on Series Method Rational Approximation Decomposition Splitting Quadrature Rule Krylov Subspace eAeA eAveAv small large spec(A) regular basis and rational basis
17
Outline Numerical Integration in Circuit Simulation Matrix Exponential Method – Krylov Subspace Approximation – Rational Krylov Subspace Approximation – Parallelism Experimental Results Conclusions 17
18
Krylov Subspace Approximation (1/2) Krylov subspace K(A, v) = {v, Av, A 2 v, …, A m-1 v} – orthogonalized by Arnoldi process – approximate e Ah v by e Hmh – posteriori error estimation [Saad92] 18 {v, Av, A 2 v, …, A m-1 v} Arnoldi process sparse matrix-vector multiplication m is about 10~100 fast error estimation scaling invariant efficiency adaptivity
19
Stiffness affects step size and dimension – Arnoldi process captures extreme and clustered eigenvalues – Error bound [Saad92] Krylov Subspace Approximation (2/2) 19 Image{h } Real{h } highly stiff - max - min Image{h } Real{h } captured regions Arnoldi process with a small m critical part for e Ah shrink h or increase m for capturing critical eigenvalues where remedied by restarted scheme and scaling effect [ICCAD ‘12]
20
Outline Numerical Integration in Circuit Simulation Matrix Exponential Method – Krylov Subspace Approximation – Rational Krylov Subspace Approximation – Parallelism Experimental Results Conclusions 20
21
Rational basis (I- A) -1 – K((I- A) -1, v) = {v, (I- A) -1 v, …, (I- A) -m v} Rational Krylov Subspace Approximation (1/2) 21 ….. for j = 1, 2,..., m solve (I- A)w = v j for i = 1, 2,..., j H i,j = w T v i w = w − H i,j v i end H j+1,j = |w| 2 v j+1 = w/H j+1,j end Arnoldi process (C+ G)w=Cv j avoid regularization of C subspace for A one LU for linear circuit w=Av j
22
Rational basis (I- A) -1 – K((I- A) -1, v) = {v, (I- A) -1 v, …, (I- A) -m v} Approximation of e Ah v Posteriori error estimation [van den Eshof 06] Rational Krylov Subspace Approximation (1/2) 22 adaptivity
23
Spectral transformation – similar to preconditioning – relax stiffness constraint – enable large step size with less dimension ’ min ’ max small gap - max - min -h ’’ max -h ’’ min - ’’ max - ’’ min Rational Krylov Subspace Approximation (2/2) 23 Image{h } Real{h } transforming spectrum by (I- A) -1 captured by Arnoldi process critical part for e A projecting back to A by 1/ (I-H -1 ) applying large h to 1/ (I-H -1 ) small m is acceptable determined by within a unit circle
24
Spectral transformation – similar to preconditioning – relax stiffness constraint – enable large step size with less dimension Rational Krylov Subspace Approximation (2/2) 24 small step size fix , sweep m and h
25
Spectral transformation – similar to preconditioning – relax stiffness constraint – enable large step size with less dimension Rational Krylov Subspace Approximation (2/2) 25 = 10 -12 large error fix h, sweep m and
26
Methods LinearNonlinear HighMildLowHighMildLow Forward Eulerslowfastslowfast Backward Eulermedium slow Trapezoidal> Backward Euler Krylov Approx slowfastslowmedium Ration Krylov fastslow Wrap Up MethodsComputationScalabilityErrorStabilityStep size Forward Eulerx=AvhighO(h 2 )lowtiny Backward EulerAx=blowO(h 2 )A-stablemedium TrapezoidalAx=blowO(h 3 )A-stable> Backward Euler Krylov Approxx=AvhighO(h n )highmedium Ration KrylovAx=blowO(h n )highlarge 26
27
Outline Numerical Integration in Circuit Simulation Matrix Exponential Method – Krylov Subspace Approximation – Rational Krylov Subspace Approximation – Parallelism Experimental Results Conclusions 27
28
Parallelism in Krylov Subspace Arnoldi process – sparse matrix-vector multiplication [Bell&Garland ‘09] Exponential of a small matrix [Higham ‘05] – dense matrix by matrix operation 28 … thread 1 thread 2 thread n-1 thread n
29
t9 Constant slope within a step Input Grouping 29 input 1 input 2 time t1 t2 t3 t4 t5t6t7 t8 t10 t11 t12t13t14 t15 tiny steps due to maintaining constant slope
30
Constant slope within a step Input Grouping 30 group 1 group 2 time t1 t2 t3 t4 t5 t6 t7 t8 t1 t2 t3 t4 t5 t6 t7 t8 thread 1 thread 2
31
Outline Numerical Integration in Circuit Simulation Matrix Exponential Method – Krylov Subspace Approximation – Rational Krylov Subspace Approximation – Parallelism Experimental Results Conclusions 31
32
Settings of Experiments Environment – Implemented in Matlab – Intel i7 2.67GHz with 4GB memory Benchmarks – Nonlinear and large-scale circuits – Power distribution networks – IBM power grid testcases [Nassif 08] 32 DesignCategory# R# C# Trans.SizeStiffness D116bit adder723344485791.1x10 3 D2ALU13.6K4.3K650210K5.4x10 6 D3IO1.26M34.6K1461630K1.6x10 6 D4Power grid10.4M8.6M012M2.6x10 5 generalized eigenvalues of (G, C)
33
Settings of Experiments Environment – Implemented in Matlab – Intel i7 2.67GHz with 4GB memory Benchmarks – Nonlinear and large-scale circuits – Power distribution networks – IBM power grid testcases [Nassif 08] 33 DesignArea (mm 2 )# R# C# LSizeStiffness P10.35 2 23K15K 45.7K 8.7x10 9 P21.40 2 348K228K 688K 8.3x10 9 P32.80 2 1.46M0.97M 2.90M 1.0x10 10 P45.00 2 3.75M2.47M 7.40M 1.0x10 10 RC tanks for PCB and package
34
Settings of Experiments Environment – Implemented in Matlab – Intel i7 2.67GHz with 4GB memory Benchmarks – Nonlinear and large-scale circuits – Power distribution networks – IBM power grid testcases [Nassif 08] 34 Design# R# C# L# I# VSizeStiffness ibmpg2t245K36K33036K330164K 3.5x10 12 ibmpg3t1.60M201K955201K9551M 3.4x10 11 ibmpg4t1.83M265K962266K9621.2M 2.5x10 11 ibmpg5t1.55M473K277473K539K2.1M 4.7x10 11 ibmpg6t2.41M761K281761K836K3.2M 3.8x10 11
35
Nonlinear and Large-scale Circuits Matrix exponential method (MEXP) – Krylov subspace approximation – Restarted scheme and parallel SpMV on GPU Trapezoidal method (TRAP) – same adaptive scheme as MEXP 35 DesignSizetimemTRAPMEXP-Krylovspeedup D1579100ps20671.4s408.7s1.64X D210K100ps303,085.91s982.14s3.14X D3630K100ps308,053.45s535.92s15.05X D412M1ns20fails629.56n/a Parallel SpMV
36
Power Distribution Networks Simulate long time span (1μs) for step response One LU factorization – averaged by forward/backward substitutions MEXP with rational basis adaptively scales h/ TRAP uses predetermined step size 36 Design TRAP (h = 10ps) MEXP – Rational ( = 10 -10 ) LU(s)TotalLU(s)TotalSpeedup P10.6744.85m0.682.86m15.73X P215.6015.43h15.4854.57m16.96X P391.6076.92h93.284.30h17.91X P4293.81203.64h298.8311.26h18.08X adaptive & large step size
37
Power Distribution Networks 37
38
IBM Testcases Widely adopted benchmarks Many input current sources Same MEXP with rational basis and TRAP 38 Design TRAP (h = 10ps) MEXP – Rational ( = 10 -10 ) LU(s)Total(s)LU(s)Total(s)Speedup ibmpg2t1.3148.191.2941.811.15X ibmpg3t18.05493.9718.41413.901.19X ibmpg4t30.32675.7831.01229.132.95X ibmpg5t16.16657.1316.48649.971.01X ibmpg6t23.99965.5334.60915.621.05X ill alignment
39
IBM Testcases 39
40
Applying simple grouping – each group of inputs has the same pivot points – 6X speedup on average IBM Testcases 40 Design TRAP (h = 10ps) MEXP – Rational ( = 10 -10 ) LU(s)Total (s)# GroupLU (s)Total (s)Speedup ibmpg2t1.3148.19251.297.936.77X ibmpg3t18.05493.972518.4186.246.08X ibmpg4t30.32675.78431.01124.165.73X ibmpg5t16.16657.132516.48111.975.44X ibmpg6t23.99965.532534.60166.345.80X
41
Conclusions Emerging challenges in the circuit simulation – scalability and performance Matrix exponential method – accuracy, adaptivity and stability – regular and rational Krylov subspace approximation Effectiveness of matrix exponential method – Simulate a large-scale circuit with 12M nodes – Nonlinear circuits: 6.61X speedup on average – Impulse response for PDNs: 15X speedup – IBM testcases: 6X speedup using input grouping 41
42
Future Works Variant basis in Krylov subspace – inverted, extended basis Model Order Reduction and matrix exponential method – both exploiting Krylov subspace – utilizing well-developed MOR to MEXP Hybrid simulation via matrix exponential – handle thermal, mechanical phenomena with FEM 42
43
Thank you! 43
44
Trade off between stability and performance SILCA [Li & Shi, ‘03]ACES [Devgan & Rohrer, ‘97] Where are we? 44 computational effort stability high low high Backward Euler Forward Euler Matrix Exponential Method [Weng et. al. ’11] Telescopic [Dong & Li, ‘10]Waveform Relaxation [E Lelarasmee et. al, ‘82]Domain Decomposition [K. Sun et. al., ‘07]LIM [J. E. Schutt-Aine, ‘01] Tailor for circuit simulation: Adaptive step control Scaling effect Nonlinear device Parallelization ETD in numerical community: [Saad ‘92] [Ban et. al. ‘11] [Aluffi-Pentini et. al. ‘03] [Hochbruck et. al. ‘97] Trapezoidal Method(SPICE)
45
Adaptive Step Control Typical circuit behavior 45 larger h smaller h error budget
46
Adaptive Step Size Strategy Adjustment of step size – Krylov subspace approximation require only to scale H m : α A → α H m re-calculate e Hm – backward Euler (C/h+G) changes and needs to solve linear system again Strategy: – maximize step size with a given error budget Err total – error are from Krylov space method and linearization 46
47
Nonlinear Formulation Decouple nonlinear and linear components 47 constant during Newton’s iterationcalculate Jacobian matrix J(F) in MEXP has less non-zeros approximate e A F MEXP: BE:
48
Rational basis A -1 – K( A -1, v) = {v, A -1 v, …, A -m v} – requires more m and smaller h Only Inverted 48 Image{h } Real{h } after shifted-and-inverted only inverted smaller spectrum -1/ min
49
Different 49 needs large m
50
Different 50
51
Spectral Transformation – h = 10p Small RC mesh, 100 by 100 Different h for Krylov subspace Different for rational Krylov subspace 51
52
Spectral Transformation – h = 10f Small RC mesh, 100 by 100 Different h for Krylov subspace Different for rational Krylov subspace 52
53
Spectral Transformation – = 10f Small RC mesh, 100 by 100 Different h for Krylov subspace Different for rational Krylov subspace 53
54
Spectral Transformation– = 1p Small RC mesh, 100 by 100 Different h for Krylov subspace Different for rational Krylov subspace 54
55
Spectral Transformation– = 100p Small RC mesh, 100 by 100 Different h for Krylov subspace Different for rational Krylov subspace 55
56
Sweep for Large Range 56
57
Sweep for Large Range 57
58
Difference Between Inverted and Rational 58
59
Fixed = 1p, sweep time step h 59
60
Fixed = 1n, sweep time step h 60
61
Fixed = 1u, sweep time step h 61
62
Fixed = 1m, sweep time step h 62
63
Fixed = 1, sweep time step h 63
64
Fixed = 1k, sweep time step h 64
65
Fixed = 1M, sweep time step h 65
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.