Download presentation
Presentation is loading. Please wait.
Published byDouglas Lewis Modified over 9 years ago
1
Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007
2
Outline Research Directions Simulation test case results Overview of Simulation Commercial Package Alternating direction implicit (ADI) Method General Operator Splitting Method Distributed Computing Conclusions and Future Works
3
Research Directions Simulation: SPICE, STA Network on Chip: topology and wire styles, Power, and Clock Networks Data Path Components: adders, shifters, multipliers, division Packaging: passive distortion compensation
4
6x6 Bump Simulation Results The Circuit: –184K Capacitors, 17K Current Sources, 120K Inductors and 246K Resistors. –306K Nodes Accuracy: –Waveform and measurement results match Fujitsu ’ s with less than 0.002% error. Runtime / Memory Comparison: CPU_TimeMemoryComputer Used UCSD678s600.2MPentium 4 3.2G, Linux Fujistu Log File1845s771Munknown
5
6x6 Bump Simulation Results Measurement results and waveform Min_pwr_l_est_10000954Min_18269323Min_33085875 UCSD0.99807900.99673570.9934251 Fujistu Log File0.99806200.99669400.9933790 Error0.002%0.004%0.005% (Red curve is UCSD result)
6
703KR Simulation Results The Circuit: –514K Capacitors, 76K Current Sources, 370K Inductors and 703K Resistors. –1.3M Nodes Accuracy: –Measurement results match Fujitsu ’ s with less than 0.02% error. Runtime / Memory Comparison: CPU_TimeMemoryComputer Used UCSD2575s (0.7h)1.7GPentium 4 3.2G, Linux Fujistu Log File864561s (240h)2.28Gunknown
7
703KR Simulation Results Measurement results and waveform Min_33096003Min_33096004Min_33097557 UCSD0.94009880.94211570.9370827 Fujistu Log File0.93996100.94192600.9368400 Error0.015%0.02%0.026% (UCSD results only. Fujitsu waveform is not available for comparison)
8
Further Speed-ups Reduce iteration count by 50% for pure linear circuits (like 6x6 bump and 703KR) –2x speed up More effective time step control –DVDT, breakpoint, truncation error. 1.5 - 3x speed up Use Multigrid solver –1.5 - 2x speed up for medium circuits (6x6 bump) –2x – 10x speed up for large circuits (703KR) Parallel simulation –4 or more processors on linux cluster –32 to hundreds of processors on supercomputer. Overall speed-up –6x - 60x speed up without parallel simulation –12x - 1000x speed up with parallel simulation
9
Performance and capacity prediction Cases 10x-100x larger than 703KR. Preferred SolverCpu TimeMemory Small - Medium 0.3M nodes LU Decomposition11 minutes600M Medium - Large 1.3M nodes Multigrid43 minutes1.7G Huge 10–100 M nodes Multigrid + Parallel 5 – 100 hours15G - 200G
10
Overview of Simulation Our research Fast speed with SPICE accuracy Nonlinear devices Efficient matrix solvers Effective integration methods Time step controls according to different integration methods Distributed computing Yes Load Circuit Device Evaluation LU Decomposition N-R Converge? Next Time Point Time Step Control Integration Approximation Linearization No
11
Overview of Simulation Matrix Solver LU Decomposition Iterative Approach Integration Time Step Control ADI Nonlinear Devices Two Stage Newton Raphson Distributed Computing Commercial Implementation
12
Overview of Simulation Integration Time Step Control ADI (two-way partitioning) Operator Splitting (multi-way) Distributed Computing MPI Partitioning Three Ph.D. Students
13
Commercial Package: Fastrack Design Founded in January 2001 Headquartered in San Jose Privately funded, cash-flow positive Two Business Units Design Services Technology Products
14
Analog Designs Design # Elements Sim. Len HSpicemSPICESPEEDUPFACTOR LVDS1349020us80h26h 3.1X Oscillator2221 ms13,706s2,670s 5.1X Biasing Circuit 49197200ns427s82s 5.2X PLL1605040us67d12d 5.6X PLL (post-layout) 300K40us290d (est)16d 18.1X
15
Digital Blocks DesignNameDevicesRuntime Speedup Factor MOSRCmSPICE Traditional Spice ALU10.1k12.7k7.5k6.9m7m1.0X CONTROL69k83.7k52.5k1.5h9.5h6.3X YN_BLK205K242.8k203.9k3.5h> 2d>13.7X THP437k499.3k313.5k5.0hCOULD NOT RUN ∞ VCON936k753k561k15.0hCOULD NOT RUN ∞
16
Memory Blocks Design#Tr#R#C # Vectors / Sim. Length mSPICE Run Time BRAM (pre)220K050022.5 hours SRAM (pre) 8Kx8 SP 410K0027 hours eRAM (post) 256x16 72K28K427K48ns8 hours BRAM (post)220K1320K870K218 hours 100% accurate Spice simulation
17
mSPICE-Parallel Industry’s first practical parallel Spice simulation solution –Increases capacity further –Dramatically improves throughput Uses Matrix Level Partitioning –No loss of accuracy –Client-Server configuration –Minimal memory requirement for client nodes
18
Client-Server Configuration Server distributes sub-matrices to clients Clients communicate partial solutions Minimal memory requirements for clients 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1 0 0 1 0 1 0 1 0 0 1 0 0 0 1 0 1 1 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1
19
Experimental Results DesignTotalElements Sim. Length Runtime 1-proc2-proc4-proc ASIC1.2M8ns12.2h7.0h (1.7X) 5.1h (2.4X) 38IO SSO1.4M30ns3.0h2.0h (1.5X) 1.4h (2.2X) Signal-power2.1M1.2us13d7d18h (1.7X) 5d12h (2.4X) 4096x8 RAM (extracted) 2.3M10ns32h18.5h (1.7X) 13.4h (2.4X) 120IO SSO3.5M30ns6.2h4.1h (1.5X) 3.1h (2.0X)
20
ADI: Previous Works 1999, Namiki and Ito –the alternating direction implicit (ADI) is used to simulate a 2D TE wave. 2001, Zheng etc. –extend to 3D problem 2001 & 2003, Lee and Chen –ADI is used to transmission line modeled power grid The alternation is among different geometric directions, so the simulated geometric structure is constrained.
21
Alternating Direction Implicit (ADI) ADI Integration Method –Two way partition of the circuit –One partition is used for each backward integration –Unconditional stable (A-stable: independent of time step size) –Time step size according to local truncation error.
22
Alternating Direction Implicit (ADI) ADI method formulation Circuit partition algorithm Local truncation error estimation Stability discussion Experimental results
23
SPICE Formulation Equations for RLC circuits where C: capacitance matrix L: inductance matrix R: resistance matrix G: conductance matrix E: incidence matrix
24
ADI Formulation Transient simulation –Split the resistors and inductors branches into two parts G = G1 + G2 E = E1 + E2 R = R1 + R2 –Alternate Backward and Forward integration on each partition
25
ADI Formulation (Cont.) Equations of ADI method –the size of left-hand-side matrix remains unchanged –the number of non-zero elements is decreased –direct solving methods can be efficient
26
Experiments of non-zero fill-ins A small ASIC Design Spice matrix : Dimension: 10,286 The number of non-zero elements: 46,655 The number of non-zero fill-ins: 90,960 A large I/O Design Spice matrix : Dimension: 615,436 The number of non-zero elements: 2,126,246 Sub-matrix1Sub-matrix2Total # non-zero fill-ins # non-zero elements # non-zero fill-ins # non-zero elements # non-zero fill-ins Case 138,5722,61842,02010,04012,658 Case 21,176,20812,421,534950,03814,772,06827,193,602
27
Local Truncation Error (LTE) Time step control using LTE –In circuit transient analysis, the next time step can be estimated from the local truncation error at the present time point –LTE is defined as the difference between the calculated solution and the exact solution –To ensure the consistency, the local truncation error should not exceed the error tolerance, thus the time step can be estimated using
28
Local Truncation Error (Cont.) LTE of ADI method (1) equations let,, and then
29
Local Truncation Error (Cont.) LTE of ADI method (2) Estimate exact solution we characterize the input as a simple ramp over the interval (t n, t n+1 ), the exact analytic solution with time step t n:
30
Local Truncation Error (Cont.) LTE of ADI method (3) Estimate ADI solution
31
Local Truncation Error (Cont.) LTE of ADI method (3) Estimate ADI solution
32
Local Truncation Error (Cont.) LTE of ADI method (4) LTE estimation
33
Local Truncation Error (Cont.) LTE of ADI method (5) Time step control
34
Local Truncation Error (Cont.) LTE of ADI method (5) Time step control
35
Stability Discussion The stability is concerned with whether the accumulated error grows or decays as time evolves through a series of time steps. One-step integration approximations, the error is accumulated by a factor of If the final steady state error vector is smaller than the initial, then the integration method is stable. In ADI integration method: –It can be proved to be unconditional stable
36
Experimental Results Circuit1Cuicuit2Circuit31k-cell #Nodes10,00040,00090,00010,200 #Transistors0006,500 Period10ns SPICE3CPU time (sec)77.8485.33,061.1181.6 #steps115 114193 ADICPU time (sec)28.6117.8275.2523.3 #steps102 949 Speedup2.7x4.1x11.1x-
37
Voltage drop of Circuit3 (power mesh with sinks)
38
Signal in 1k_cell (ASIC design)
39
General Operator Splitting General operator splitting method –Multiple way partitions –Each partition is considered separately in each time step simulation –No geometry constrains –Local truncation error is used to dynamically control time step size
40
General Operator Splitting Fundamental theory Operator splitting formulation Local truncation error estimation Stability discussion Experimental results
41
Fundamental theory In circuit transient simulation, the integration approximation is actually the approximation of the exponential operator The exponential operators can be approximated in any order using a general scheme of fractal decomposition The decomposition of exponential operators corresponds to the circuit multi-way partition New integration approximation in transient simulation
42
Fundamental theory Approximation of exponential operator –General circuit equation and solution –If we characterize the input as a simple ramp over the interval (t n, t n+1 ), the exact analytic solution with time step t n –Exponential operator approximation Forward Euler Backward Euler Trapezoidal
43
Fundamental theory Decomposition of exponential operators (Masuo Suzuki, 1991, Physics) –Function –First order: –Second order: –Third order: –(2m-1)th and (2m)th order:
44
Fundamental theory Decomposition of exponential operators
45
General Operator Splitting Formulation Transient simulation: –Apply the second order approximation –In each time step, every partition is calculated separately and trapezoidal integration is used for every partition –The size of left-hand-side matrix may be changed –The number of non-zero elements is definitely decreased –Can be easily extended to multi-way partitions
46
General Operator Splitting Formulation Equations
47
Local Truncation Error (Cont.) LTE of general operator splitting method Estimate solution
48
Local Truncation Error (Cont.) LTE of general operator splitting method Estimate solution
49
Local Truncation Error (Cont.) LTE of general operator splitting method LTE estimation
50
Local Truncation Error (Cont.) LTE of general operator splitting method LTE estimation
51
Local Truncation Error (Cont.) LTE of general operator splitting method LTE estimation
52
Stability Discussion The trapezoidal integration method is unconditional stable for stable system. In our operator splitting method, trapezoidal method is used for all the sub-systems still unconditional stable
53
Experimental Results Circuit1Cuicuit2Circuit3 #Nodes10,00040,00090,000 #Transistors000 Period10ns SPICE3CPU time (sec)77.8485.33,061.1 #steps115 114 GOSCPU time (sec)164.71011.63435.9 #steps102 Comparison2.1x2x1.1x
54
Voltage drop of Circuit3 (power mesh with sinks)
55
Conclusions We investigate alternating direction implicit and general operator splitting integration methods for transistor-level circuit transient simulation. In both methods, the circuit will be divided into several sub-circuits, thus the direct matrix solver is still efficient because the matrix is simplified. Both methods are second order accurate and unconditional stable. Overhead: –Circuit partition –Each time step consists of many sub-steps, each sub-step is a N-R iteration process Better for circuits with large linear network
56
Distributed Processors –Cluster –Supercomputer –Multi-Core Processors (Intel Dual/Quad-Core, IBM Cell etc.) Standard –MPI –Partitioning –Matrix Solver Capabilities –Speed-up (10-100+) –Memory Capacity (10-100+) Distributed Computing
57
Future Works ADI method –More experiments General operator splitting method –Design and implement multi-way circuit partition algorithm –Implement multi-way general operator splitting program –Derive LTE for general multi-way situation –More experiments Distributed Computing –MPI Standard –Distributed Partitioning, Matrix Solver
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.