Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant.

Slides:

Advertisements

Similar presentations

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.

Advertisements

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.

Lightning Effects and Structure Analysis Tool (LESAT) Steve Peters

Fast Algorithms For Hierarchical Range Histogram Constructions

5/4/2015rew Accuracy increase in FDTD using two sets of staggered grids E. Shcherbakov May 9, 2006.

Computer Science & Engineering Department University of California, San Diego SPICE Diego A Transistor Level Full System Simulator Chung-Kuan Cheng May.

Applied Linear Algebra - in honor of Hans SchneiderMay 25, 2010 A Look-Back Technique of Restart for the GMRES(m) Method Akira IMAKURA † Tomohiro SOGABE.

Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.

Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.

Rayan Alsemmeri Amseena Mansoor. LINEAR SYSTEMS Jacobi method is used to solve linear systems of the form Ax=b, where A is the square and invertible.

Numerical Parallel Algorithms for Large-Scale Nanoelectronics Simulations using NESSIE Eric Polizzi, Ahmed Sameh Department of Computer Sciences, Purdue.

OpenFOAM on a GPU-based Heterogeneous Cluster

Weiping Shi Department of Computer Science University of North Texas HiCap: A Fast Hierarchical Algorithm for 3D Capacitance Extraction.

1 Parallel multi-grid summation for the N-body problem Jesús A. Izaguirre with Thierry Matthey Department of Computer Science and Engineering University.

May 29, Final Presentation Sajib Barua1 Development of a Parallel Fast Fourier Transform Algorithm for Derivative Pricing Using MPI Sajib Barua.

High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.

A Solenoidal Basis Method For Efficient Inductance Extraction H emant Mahawar Vivek Sarin Weiping Shi Texas A&M University College Station, TX.

Module on Computational Astrophysics Jim Stone Department of Astrophysical Sciences 125 Peyton Hall : ph :

FLANN Fast Library for Approximate Nearest Neighbors

Capacitance in a quasi-steady current circuit Section 62.

1 Scalable Distributed Fast Multipole Methods Qi Hu, Nail A. Gumerov, Ramani Duraiswami Institute for Advanced Computer Studies Department of Computer.

ECE 546 – Jose Schutt-Aine 1 ECE 546 Lecture -13 Latency Insertion Method Spring 2014 Jose E. Schutt-Aine Electrical & Computer Engineering University.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

A Secure Protocol for Computing Dot-products in Clustered and Distributed Environments Ioannis Ioannidis, Ananth Grama and Mikhail Atallah Purdue University.

Algorithms and Software for Large-Scale Simulation of Reactive Systems _______________________________ Ananth Grama Coordinated Systems Lab Purdue University.

Simulating Quarks and Gluons with Quantum Chromodynamics February 10, CS635 Parallel Computer Architecture. Mahantesh Halappanavar.

PuReMD: Purdue Reactive Molecular Dynamics Package Hasan Metin Aktulga and Ananth Grama Purdue University TST Meeting,May 13-14, 2010.

Fast Low-Frequency Impedance Extraction using a Volumetric 3D Integral Formulation A.MAFFUCCI, A. TAMBURRINO, S. VENTRE, F. VILLONE EURATOM/ENEA/CREATE.

Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan.

PiCAP: A Parallel and Incremental Capacitance Extraction Considering Stochastic Process Variation Fang Gong 1, Hao Yu 2, and Lei He 1 1 Electrical Engineering.

TELECOMMUNICATIONS Dr. Hugh Blanton ENTC 4307/ENTC 5307.

Danie Ludick MScEng Study Leader: Prof. D.B. Davidson Computational Electromagnetics Group Stellenbosch University Extended Studies of Focal Plane Arrays.

1 ELEC 3105 Basic EM and Power Engineering Start Solutions to Poisson’s and/or Laplace’s.

Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.

Efficient Integration of Large Stiff Systems of ODEs Using Exponential Integrators M. Tokman, M. Tokman, University of California, Merced 2 hrs 1.5 hrs.

Acurate determination of parameters for coarse grain model.

Speeding up the Simulation of Coupling Losses in ITER size CICC joints by using a Multi-Level Fast Multipole Method and GPU technology Ezra van Lanen,

On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.

1 Complex Images k’k’ k”k” k0k0 -k0-k0 branch cut   k 0 pole C1C1 C0C0 from the Sommerfeld identity, the complex exponentials must be a function.

Molecular Simulation of Reactive Systems. _______________________________ Sagar Pandit, Hasan Aktulga, Ananth Grama Coordinated Systems Lab Purdue University.

Order of Magnitude Scaling of Complex Engineering Problems Patricio F. Mendez Thomas W. Eagar May 14 th, 1999.

Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.

Distributed Computation: Circuit Simulation CK Cheng UC San Diego

October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)

 The need for parallelization  Challenges towards effective parallelization  A multilevel parallelization framework for BEM: A compute intensive application.

Data Structures and Algorithms in Parallel Computing Lecture 10.

Some slides on UCLA LM-MHD capabilities and Preliminary Incompressible LM Jet Simulations in Muon Collider Fields Neil Morley and Manmeet Narula Fusion.

Consider Preconditioning – Basic Principles Basic Idea: is to use Krylov subspace method (CG, GMRES, MINRES …) on a modified system such as The matrix.

Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University

Inductance Screening and Inductance Matrix Sparsification 1.

Fast BEM Algorithms for 3D Interconnect Capacitance and Resistance Extraction Wenjian Yu EDA Lab, Dept. Computer Science & Technology, Tsinghua University.

A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering

1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.

Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.

TEMPLATE DESIGN © H. Che 2, E. D’Azevedo 1, M. Sekachev 3, K. Wong 3 1 Oak Ridge National Laboratory, 2 Chinese University.

1 ChaNGa: The Charm N-Body GrAvity Solver Filippo Gioachin¹ Pritish Jetley¹ Celso Mendes¹ Laxmikant Kale¹ Thomas Quinn² ¹ University of Illinois at Urbana-Champaign.

Auburn University

Xing Cai University of Oslo

ELEC 3105 Basic EM and Power Engineering

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

Parallel Density-based Hybrid Clustering

Nodal Methods for Core Neutron Diffusion Calculations

A Brachytherapy Treatment Planning Software Based on Monte Carlo Simulations and Artificial Neural Network Algorithm Amir Moghadam.

Real-Time Ray Tracing Stefan Popov.

Finite Element Method To be added later 9/18/2018 ELEN 689.

L Ge, L Lee, A. Candel, C Ng, K Ko, SLAC

Supported by the National Science Foundation.

Inductance Screening and Inductance Matrix Sparsification

Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.

Presentation transcript:

Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant Mahawar, Texas A&M University Acknowledgements: National Science Foundation.

HiPC Outline Inductance Extraction Underlying Linear System The Solenoidal Basis Method Hierarchical Algorithms Parallel Formulations Experimental Results

HiPC Inductance Extraction Inductance Property of electric circuit to oppose change in its current Electromotive force (emf) is induced Self Inductance, Mutual Inductance – between conductors Inductance extraction Signal delays in circuits depend on parasitic R, L, C At high frequency – signal delays dominated by parasitic inductance Accurate estimation of inductive coupling for circuit components Credits: oea.com

HiPC Inductance Extraction … Inductance Extraction For a set of s conductors – compute s x s impedance matrix Z Z – self and mutual impedance among conductors Conductors are discretized using a uniform two dimensional mesh for accurate impedance calculation

HiPC Constraints Current density at a point Voltage drop across filaments – filament current & voltage Kirchoff’s law at nodes Potential difference in terms of node voltage Inductance matrix – function of 1/r

HiPC Linear System System Matrix Characteristics: R – diagonal; B – sparse; L – dense Solution Method Iterative methods – GMRES Dense matrix-vector product with L hierarchical methods, matrix-free approach Challenge Effective Preconditioning in absence of system matrix

HiPC Solenoidal Basis Method Linear system with modified RHS Solenoidal basis Automatically satisfies conservation laws - Kirchoff’s current law Mesh currents - basis for filament current Solenoidal basis matrix P: Current obeys Kirchoff’s law: Reduced system

HiPC Problem Size Number of unknowns for ground plane problem Mesh Potential Nodes Current Filaments Linear System Solenoidal functions 33x331,0892,1123,2011,024 65x654,2258,32012,5454, x12916,64133,02449,66516, x25766,049131,584197,63365, x513263,169525,312788,481262,144

HiPC Hierarchical Methods Matrix-vector product with n x n matrix – O (n 2 ) Faster matrix-vector product Matrix-free approach Appel’s algorithm, Barnes-Hut method Particle-cluster interactions – O (n lg n) Fast Multipole method Cluster-cluster interactions – O (n) Hierarchical refinement of underlying domain 2-D – quad-tree, 3-D – oct-tree Rely on decaying 1/r kernel functions Compute approximate matrix-vector product at the cost of accuracy

HiPC Hierarchical Methods … Fast Multipole Method (FMM) Divides the domain recursively into 8 sub-domain Up-traversal computes multipole coefficients to give the effects of all the points inside a node at a far-way point Down-traversal computes local coefficients to get the effect of all far-away points inside a node Direct interactions – for near by points Computation complexity – O ((d+1) 4 *N) d – multipole degree

HiPC Hierarchical Methods … Hierarchical Multipole Method (HMM) Augmented Barnes-Hut method or variant of FMM Up-traversal Same as FMM For each particle Multipole-acceptance-criteria (MAC) - ratio of distance of the particle from the center of the box to the dimension of the box use MAC to determine if multipole coefficients should be used to get the effect of all far-away points or not Direct interactions – for near by points Computation complexity – O ((d+1) 2 *N lg N)

HiPC ParIS: Parallel Solver Application - inductance extraction Solve reduced system with preconditioned iterative method Iterative method – GMRES Dense matrix-vector product with preconditioner and coefficient matrix Dense matrix-vector product dominates the computational cost of the algorithm Use of hierarchical methods to computes potential – inductive effect on filaments Vector inner products Negligible computation and communication cost

HiPC Parallelization Scheme Two tier parallelization Each conductor - filaments and associated oct-tree Conductors – across MPI processes Within a conductor – OpenMP process Pruning of tree to obtain sub-trees Computation at top few levels of the tree is sequential OpenMP

HiPC Experiments Experiments on Interconnect Cross over problem 2 cm long, 2mm wide Distance between conductors within a layer -.3 mm and across layers - 3 mm Non-uniform distribution of conductors Comparison between FMM and HMM Parallel Platform Beowulf cluster – Texas A&M University 64bit AMD – Opteron LAM/MPI on SuSE-Linux – GNU compilers 1.4 GHz, 128 dual-processor nodes, Gigabit ethernet

HiPC Cross Over Interconnects

HiPC Parameters d – multipole degree α – multipole acceptance criteria s – number of particles per leaf node in tree Since d and α influence accuracy of matrix-vector product Impedance errors are kept similar – within 1% of a reference value computed by FMM with d = 8 Scaled Efficiency E = BOPS/p BOPS = average number of base operations per second p = number of processors used

HiPC Experimental Results Effect of multipole degree (d) for different choice of s FMM code HMM code

HiPC Experimental Results Effect of multipole degree (d) for different choice of s Time in secs d FMM code HMM code s=2s=8s=32s=128s=2s=8s=32s=

HiPC Experimental Results Effect of MAC on HMM for different choice of s and d Varying s Varying d

HiPC Experimental Results … Effect of MAC on HMM for different choice of s and d Time in secs αd=1d=2d= αs=2s=8s=

HiPC Experimental Results Effect of multipole degree (d) on the HMM code on p processors for two different choice of s s = 8 s = 32

HiPC Experimental Results Effect of multipole degree (d) on the HMM code on p processors for two different choice of s Time in secs d s = 8 s = 32 p=1p=2p=4p=8p=1p=2p=4p=

HiPC Experimental Results Effect of multipole degree (d) on the FMM code on p processors for two different choice of s s = 8 s = 32

HiPC Experimental Results Effect of multipole degree (d) on the FMM code on p processors for two different choice of s Time in secs d s = 8 s = 32 p=1p=2p=4p=8p=1p=2p=4p=

HiPC Experimental Results … Parallel efficiency of the extraction codes for different choice of d FMM code HMM code

HiPC Experimental Results Parallel efficiency of the extraction codes for different choice of d d FMM code HMM code p=1p=2p=4p=8p=1p=2p=4p=

HiPC Experimental Results … Ratio of execution time of FMM to HMM code on p processor for different choice of d s = 8 s = 32

HiPC Experimental Results Ratio of execution time of FMM to HMM code on p processor for different choice of d d s = 8 s = 32 p=1p=2p=4p=8p=1p=2p=4p=

HiPC Concluding Remarks FMM execution time – O ((d+1) 4 N) HMM execution time - O ((d+1) 2 N lg N) For HMM increase in MAC (α) – increase in time and accuracy for matrix-vector product FMM achieves higher parallel efficiency for large d When the number of particles per leaf node (s) is smaller, HMM outperforms FMM in execution time Parallel implementation, ParIS, is scalable and achieves high parallel efficiency

HiPC Thank You !!