Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.

Similar presentations


Presentation on theme: "Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract."— Presentation transcript:

1 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL85000. Linear Solver Challenges in Large-Scale Circuit Simulation Copper Mountain 2010 April 6 th, 2010 Erik G. Boman, David M. Day, Eric R. Keiter, and Heidi K. Thornquist Sandia National Laboratories

2 Copper Mountain 2010 Outline Circuit Design Process –Transistor-level simulation –Xyce Simulation Challenges –Network Connectivity –Load Balancing / Partitioning –Efficient Parallel Linear Solvers Linear Solver Strategies –Results

3 Copper Mountain 2010 Circuit Design Process Highly complex –Requires different tools for verifying different aspects of the circuit Cannot afford many circuit re-spins –Expense of redesign –Time to market Accurate / efficient / robust tools –Challenging for 45nm technology

4 Copper Mountain 2010 –Analog circuit simulator (SPICE compatible) –Large scale (N>1e7) “flat” circuit simulation solves set of coupled DAEs simultaneously –Distributed memory parallel –Advanced solution techniques Homotopy Multi-level Formulation Multi-time Partial Differential Equation (MPDE) Parallel Iterative Matrix Solvers / Preconditioners –2008 R&D100 Award

5 Copper Mountain 2010 Parallel Circuit Simulation Challenges Analog simulation models network(s) of devices coupled via Kirchoff’s current and voltage laws Network Connectivity –Hierarchical structure rather than spatial topology –Densely connected nodes: O(n) Badly Scaled DAEs –Compact models designed by engineers, not numerical analysts! –DCOP matrices are often ill-conditioned Non-Symmetric –Not elliptic and/or globally SPD Load Balancing / Partitioning –Balancing cost of loading Jacobian values unrelated to matrix partitioning for solves

6 Copper Mountain 2010 Parallel Circuit Simulation Structure (Transient Simulation) Simulation challenges create problems for linear solver –Direct solvers more robust –Iterative solvers have potential for better scalability Iterative solvers have previously been declared unusable for circuit simulation –Black box methods do not work! –Need to address these challenges in creation of preconditioner

7 Copper Mountain 2010 Network Connectivity (Singleton Removal) Connectivity: –Most nodes very low connectivity -> sparse matrix –Power node generates very dense row (~0.9*N) –Bus lines and clock paths generate order of magnitude increases in bandwidth Row Singleton: pre-process Column Singleton: post-process

8 Copper Mountain 2010 Network Connectivity (Hierarchical Structure) Heterogeneous Matrix Structure Some circuits exhibit unidirectionality : –Common in CMOS Memory circuits –Not present in circuits with feedback (e.g. PLLs) –Block Triangular Form (BTF) via Dulmage- Mendelsohn permutation BTF benefits both direct and preconditioned iterative methods Used by Tim Davis’s KLU in Trilinos/AMESOS (The “Clark Kent” of Direct Solvers)

9 Copper Mountain 2010 Network Connectivity (Parasitics/PLLs) Other circuits do not exhibit unidirectionality : –Common in phase-locked loops (PLLs) –Common in post-layout circuits circuits with parasitics important for design verification often much larger than original circuit Dulmage-Mendelsohn permutation results in large irreducible block

10 Copper Mountain 2010 Load Balancing / Partitioning Balancing Jacobian loads with matrix partitioning for iterative solvers –Use different partitioning for Jacobian loads and solves –Simple distribution of devices across processors Matrix partitioning more challenging: –Graph Assumes symmetric structure Robust software available (ParMETIS, etc.) –Hypergraph Works on rectangular, non-symmetric matrices Newer algorithms (Zoltan, etc.) More expensive to compute More accurately measures communication volume

11 Copper Mountain 2010 Linear Solver Strategies ~ Iterative and Direct ~ Strategy 1: (DD) Global Block Ordering Singleton Removal Hypergraph Block Partitioning Block Jacobi GMRES Local Block Ordering Singleton Removal Graph Row Partitioning ILUTGMRES Strategy 2: (BTF) Direct Solver Strategies: –KLU (serial), SuperLU-Dist (parallel)

12 Copper Mountain 2010 Linear Solver Strategies ~ Strategy 1 ~ Assertion: Black box methods won’t work! Block Ordering Singleton Removal Partitioning ILUT GMRES StrategyPrecondNTotal CutsCondition #GMRES ItersSolve Time Black BoxILUT1220~10003.00E+055004.7 Strategy 1 SR+Zoltan+ AMD+ILUT1054681.00E+041270.43 “Black Box”“Strategy”

13 Copper Mountain 2010 Assertion: Solver strategy is problem dependent! - Ex: 100K transistor IC problem “Strategy 1” Solver Performance StrategyMethodResidualGMRES ItersSolve Time 1 (4 procs) SR+Zoltan+ AMD+ILUT 3.425e-01 (FAIL) 500302.573

14 Copper Mountain 2010 Linear Solver Strategies ~ Strategy 2 ~ StrategyMethodResidualGMRES ItersSolve Time 1SR+Zoltan+ AMD+ILUT 3.425e-01 (FAIL) 500302.573 2SR+BTF+ Hypergraph+KLU 3.473e-1030.139 Original BTF+Hypergraph (4 procs)

15 Copper Mountain 2010 Test Circuits CircuitNCapacitorsMOSFETsResistorsVoltage Sources Diodes ckt16888389322248117575291761 ckt2434749161408610542766761249986 ckt31162475255269085760791370 ckt4637612082361173251947560 ckt54685021548188160210 ckt632632156138800230 ckt72518707109702640 ckt8177881427474540150 ckt91562275071017311057290 ckt101021746042431230

16 Copper Mountain 2010 Results - 4 Cores CircuitTaskKLU (serial) SLUDDDBTFSpeedup (KLU/BTF) ckt3 Setup13156F2572.3x Load741568F25621.3x Solve66992230F225526.2x Total79832903F29238.6x ckt4 Setup55258F2F1- Load15344F2F1- Solve106157F2F1- Total840274F2F1- ckt10 Setup361F1- Load606300339F1- Solve32330492460F1- Total98933812827F1- F1 = BTF large irreducible block F2 = Newton convergence failure

17 Copper Mountain 2010 Results - 16 Cores CircuitTaskKLU (serial) SLUDDDBTFSpeedup (KLU/BTF) ckt1 Setup2396F320719912.0x Load2063F319418011.4x Solve1674F335733105.4x Total6308F340017178.8x ckt3 Setup13129F2294.5x Load741181F21754.2x Solve66991271F28479.8x Total79831470F230626.1x ckt4 Setup55232F2F1- Load15321F2F1- Solve106133F2F1- Total840192F2F1- F1 = BTF large nonreducible block F2 = Newton convergence failure F3 = Out of memory

18 Copper Mountain 2010 Preconditioning Directions: Multilevel ILU ckt10 : CircuitSim90 - Voter circuit Needs more efficient preconditioner [ILU++, Mayer, J.]

19 Copper Mountain 2010 Conclusions Iterative linear solvers can enable scalable circuit simulation –Dependent upon choosing correct preconditioning strategy BTF preconditioning strategy has been successful –Great for CMOS memory circuits (ckt3) –Performs better than standard strategy on Xyce 680k ASIC (ckt1) But it is still not a silver bullet … –Circuits with feedback (PLLs) are more challenging (ckt4) Multilevel techniques are a positive research direction –Can help to more efficiently precondition circuit with large irreducible blocks (ckt10)


Download ppt "Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract."

Similar presentations


Ads by Google