Improving Coarsening and Interpolation for Algebraic Multigrid Jeff Butler Hans De Sterck Department of Applied Mathematics (In Collaboration with Ulrike.

Slides:

Advertisements

Similar presentations

Mutigrid Methods for Solving Differential Equations Ferien Akademie 05 – Veselin Dikov.

Advertisements

Multilevel Hypergraph Partitioning Daniel Salce Matthew Zobel.

Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.

1 Iterative Solvers for Linear Systems of Equations Presented by: Kaveh Rahnema Supervisor: Dr. Stefan Zimmer

1 Numerical Solvers for BVPs By Dong Xu State Key Lab of CAD&CG, ZJU.

CS 290H 7 November Introduction to multigrid methods

SOLVING THE DISCRETE POISSON EQUATION USING MULTIGRID ROY SROR ELIRAN COHEN.

An Efficient Multigrid Solver for (Evolving) Poisson Systems on Meshes Misha Kazhdan Johns Hopkins University.

1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Feb 26, 2013, DyanmicParallelism.ppt CUDA Dynamic Parallelism These notes will outline CUDA.

CSCI-455/552 Introduction to High Performance Computing Lecture 26.

MULTISCALE COMPUTATIONAL METHODS Achi Brandt The Weizmann Institute of Science UCLA

Geometric (Classical) MultiGrid. Hierarchy of graphs Apply grids in all scales: 2x2, 4x4, …, n 1/2 xn 1/2 Coarsening Interpolate and relax Solve the large.

Numerical Algorithms Matrix multiplication

1 Maximal Independent Set. 2 Independent Set (IS): In a graph, any set of nodes that are not adjacent.

Algebraic MultiGrid. Algebraic MultiGrid – AMG (Brandt 1982)  General structure  Choose a subset of variables: the C-points such that every variable.

Numerical Algorithms • Matrix multiplication

CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.

Fast, Multiscale Image Segmentation: From Pixels to Semantics Ronen Basri The Weizmann Institute of Science Joint work with Achi Brandt, Meirav Galun,

1 An Adaptive Nearest Neighbor Classification Algorithm for Data Streams Yan-Nei Law & Carlo Zaniolo University of California, Los Angeles PKDD, Porto,

Ant Colony Optimization Optimisation Methods. Overview.

An Algebraic Multigrid Solver for Analytical Placement With Layout Based Clustering Hongyu Chen, Chung-Kuan Cheng, Andrew B. Kahng, Bo Yao, Zhengyong Zhu.

Geometric (Classical) MultiGrid. Linear scalar elliptic PDE (Brandt ~1971)  1 dimension Poisson equation  Discretize the continuum x0x0 x1x1 x2x2 xixi.

Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.

Monte Carlo Methods in Partial Differential Equations.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.

Multigrid for Nonlinear Problems Ferien-Akademie 2005, Sarntal, Christoph Scheit FAS, Newton-MG, Multilevel Nonlinear Method.

A Parallel Integer Programming Approach to Global Routing Tai-Hsuan Wu, Azadeh Davoodi Department of Electrical and Computer Engineering Jeffrey Linderoth.

Finding dense components in weighted graphs Paul Horn

Estimating the Laplace-Beltrami Operator by Restricting 3D Functions Ming Chuang 1, Linjie Luo 2, Benedict Brown 3, Szymon Rusinkiewicz 2, and Misha Kazhdan.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

Hans De Sterck Department of Applied Mathematics University of Colorado at Boulder Ulrike Meier Yang Center for Applied Scientific Computing Lawrence Livermore.

Van Emden Henson Panayot Vassilevski Center for Applied Scientific Computing Lawrence Livermore National Laboratory Element-Free AMGe: General algorithms.

ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

Discontinuous Galerkin Methods for Solving Euler Equations Andrey Andreyev Advisor: James Baeder Mid.

1 Computational Methods II (Elliptic) Dr. Farzad Ismail School of Aerospace and Mechanical Engineering Universiti Sains Malaysia Nibong Tebal Pulau.

Random Graph Generator University of CS 8910 – Final Research Project Presentation Professor: Dr. Zhu Presented: December 8, 2010 By: Hanh Tran.

A Dirichlet-to-Neumann (DtN)Multigrid Algorithm for Locally Conservative Methods Sandia National Laboratories is a multi program laboratory managed and.

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

Introduction to Scientific Computing II Multigrid Dr. Miriam Mehl Institut für Informatik Scientific Computing In Computer Science.

Introduction to Scientific Computing II Multigrid Dr. Miriam Mehl.

Lecture 21 MA471 Fall 03. Recall Jacobi Smoothing We recall that the relaxed Jacobi scheme: Smooths out the highest frequency modes fastest.

Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.

Introduction to Scientific Computing II

Domain decomposition in parallel computing Ashok Srinivasan Florida State University.

MULTISCALE COMPUTATIONAL METHODS Achi Brandt The Weizmann Institute of Science UCLA

IMRank: Influence Maximization via Finding Self-Consistent Ranking

Computing Approximate Weighted Matchings in Parallel Fredrik Manne, University of Bergen with Rob Bisseling, Utrecht University Alicia Permell, Michigan.

James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.

A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering

Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.

Optimizing 3D Multigrid to Be Comparable to the FFT Michael Maire and Kaushik Datta Note: Several diagrams were taken from Kathy Yelick’s CS267 lectures.

Conflict Resolution of Chinese Chess Endgame Knowledge Base Bo-Nian Chen, Pangfang Liu, Shun-Chin Hsu, Tsan-sheng Hsu.

High Performance Computing Seminar II Parallel mesh partitioning with ParMETIS Parallel iterative solvers with Hypre M.Sc. Caroline Mendonça Costa.

High Performance Computing Seminar

Parallel Programming By J. H. Wang May 2, 2017.

Introduction to Scientific Computing II

Objective of This Course

Stencil Quiz questions

CS 252 Project Presentation

Stencil Quiz questions

CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29

Comparison of CFEM and DG methods

Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Oct 14, 2014 slides6b.ppt 1.

Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson StencilPattern.ppt Oct 14,

Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.

CASA Day 9 May, 2006.

Presentation transcript:

Improving Coarsening and Interpolation for Algebraic Multigrid Jeff Butler Hans De Sterck Department of Applied Mathematics (In Collaboration with Ulrike Meier Yang, LLNL)

Copper Mountain Outline Introduction: Algebraic Multigrid (AMG) Modification 1: PMIS Greedy vs. PMIS Modification 2: PMIS restricted to finer grid levels Modification 3: FF and FF1 Interpolation

Copper Mountain 2006Introduction3 Solve: Au = f A, f from PDE discretization sparse Parallel large problems: 10 9 degrees of freedom Unstructured grid problems

Copper Mountain 2006Introduction4 AMG Structure Setup Phase On each level (m): -Select coarse grid points (coarsen) -Define interpolation operator, P (m) -Define restriction and coarse-grid operators R (m) = P (m) T A (m+1) = P (m) T A (m) P (m) Solve Phase

Copper Mountain 2006Introduction5 AMG Complexity & Scalability Goal: Scalable Algorithm –O(n) operations per V-cycle –  AMG independent of n Operator Complexity: Measure of memory use, work in solve phase, and work required in coarsening and interpolation process E.g. 3D geometric multigrid:

Copper Mountain 2006Introduction6 Classical AMG Coarsening (Ruge, Stueben) (H1) All F-F connections require connection to a common C-point (good nearest neighbor interpolation) (H2) Maximal Independent Set: Independent: no two C-points are connected Maximal: If one more C-point is added, independence is lost

Copper Mountain 2006Introduction7 Classical AMG Coarsening (Ruge, Stueben) (H1) All F-F connections require connection to a common C-point (good nearest neighbor interpolation) (H2) Maximal Independent Set: Independent: no two C-points are connected Maximal: If one more C-point is added, independence is lost Enforce H1 rigorously with H2 as a guide (change F-points to C-points) More C-points = higher complexity

Copper Mountain 2006Introduction8 AMG Coarsenings 1.Ruge-Stueben (RS) –two passes: second pass to ensure that F-F have a common C –disadvantage: highly sequential 2.CLJP –based on parallel independent set algorithms developed by Luby and later by Jones & Plassman –ensures that F-F have a common C 3.PMIS (De Sterck, Yang) –Parallel Modified Independent Set (PMIS) (Luby) –do not enforce heuristic H1 (F-F without a common C)

Copper Mountain 2006Introduction9 PMIS Selection Step 1 select C-points with maximal local measure make neighbors F-points remove neighbor edges * This grid animation courtesy of Ulrike Yang, LLNL

Copper Mountain 2006Introduction10 PMIS Remove and Update Step 1 select C-points with maximal local measure make neighbors F-points remove neighbor edges * This grid animation courtesy of Ulrike Yang, LLNL

Copper Mountain 2006Introduction11 PMIS Selection Step 2 select C-points with maximal local measure make neighbors F-points remove neighbor edges * This grid animation courtesy of Ulrike Yang, LLNL

Copper Mountain 2006Introduction12 PMIS Remove and Update Step 2 select C-points with maximal local measure make neighbors F-points remove neighbor edges * This grid animation courtesy of Ulrike Yang, LLNL

Copper Mountain 2006Introduction13 PMIS Final Coarsening select C-points with maximal local measure make neighbors F-points remove neighbor edges * This grid animation courtesy of Ulrike Yang, LLNL

Copper Mountain 2006Introduction14 PMIS Summary (De Sterck, Yang & Heys: 2006) PMIS coarsening worked well for many problems (with GMRES) for some problems, convergence degraded compared to RS –decreased accuracy in interpolation due to an inadequate amount of C-points One solution: add C-points (RS, CLJP) Other possibilities: –modify PMIS to (hopefully) improve convergence (PMIS Greedy) –combine PMIS coarsening with RS/CLJP coarsening –use distance-two C-points as long-range interpolation for F-F connections without a common C-point (F-F interpolation)

15 Modification 1: PMIS Greedy vs. PMIS PMIS Greedy –same procedure as in PMIS –increase the measure of an unassigned point if it is strongly connected to a newly assigned F-point in each pass –helps to remove some randomness in the grid structure –interpolation has a chance to be more accurate 5-pt Laplacian: RSPMISPMIS Greedy = finest grid= second finest grid=third finest grid

Copper Mountain 2006PMIS vs. PMIS Greedy16 Results for an “Easy” Problem 27-pt laplacian, 1 processor, dof AMG C op # iterationst setup (s)t solve (s)t total (s) RS PMIS PMIS Greedy AMG-GMRES(5) C op # iterationst setup (s)t solve (s)t total (s) RS PMIS PMIS Greedy

Copper Mountain 2006PMIS vs. PMIS Greedy17 Results for a More Difficult Problem 3D elliptic PDE with jumps in the coefficient a (au x ) x + (au y ) y + (au z ) z = 1 1 processor, 80 3 dof AMG-GMRES(5) C op # iterationst setup (s)t solve (s)t total (s) RS21.54Ran outof Memory PMIS PMIS Greedy

Copper Mountain 2006PMIS vs. PMIS Greedy18 PMIS vs. PMIS Greedy Summary better convergence and timing for some problems improvement not major for other problems, PMIS Greedy not much better (and sometimes even worse) than PMIS with respect to convergence and timing slightly higher operator complexities for PMIS Greedy PMIS Greedy will require more communication in parallel than PMIS

Copper Mountain 2006PMIS on Finer Levels Only19 Modification 2: Restrict PMIS to Finer Grid Levels Perform PMIS on first g grid levels and RS on all remaining levels Recall: 3D geometric multigrid : Advantage: –PMIS reduces operator complexity (compared to RS) where it makes the biggest difference (finer levels) –RS produces more “structured” grids leading to better interpolation on coarser levels where impact on operator complexity is reduced

Copper Mountain 2006PMIS on Finer Levels Only20 Results: 27-pt Laplacian Problem 1 processor, dof AMG C op # iterationst setup (s)t solve (s)t total (s) PMIS (all levels) PMIS (first 2 levels) AMG-GMRES(5) C op # iterationst setup (s)t solve (s)t total (s) PMIS (all levels) PMIS (first 2 levels)

21 Results: 3D elliptic PDE with jumps in a (au x ) x + (au y ) y + (au z ) z = 1 AMG, 1 processor, dof C op # iterationst setup (s)t solve (s)t total (s) PMIS (all levels)2.44>> 200Slow toconverge PMIS (first level only) PMIS (first 2 levels) AMG-GMRES(5), 1 processor, 80 3 dof C op # iterationst setup (s)t solve (s)t total (s) PMIS (all levels) PMIS (first level only) PMIS (first 3 levels) * C op (RS) = 21.54

Copper Mountain 2006PMIS on Finer Levels Only22 PMIS on Higher Levels Only: Summary Improvement in convergence properties for a variety of problems (although some only small) How many levels PMIS? –problem dependent Trade-off between memory (increased operator complexity) and speed of convergence/execution Similar approach for parallel case is possible –CLJP has similar performance characteristics as RS

Modification 3: Modified FF Interpolation FF Interpolation* (De Sterck, Yang: Copper 2005): When a strong F-F connection is encountered, do not add a C-point, but extend interpolation to distance-two C- points No C-points added, but get larger interpolation stencils (and therefore somewhat increased operator complexity) *Closely related to Stueben’s Standard Interpolation

Copper Mountain 2006Modified FF Interpolation24 FF1 Interpolation Modified FF Interpolation (FF1) To reduce operator complexity, only include one distance-two C-point when a strong FF connection is encountered Setup time, complexity are reduced

Copper Mountain 2006Modified FF Interpolation25 Results: 7-pt Laplacian Problem PMIS coarsening, 1 processor, dof AMG C op # iterationst setup (s)t solve (s)t total (s) Regular FF FF AMG-GMRES(5) C op # iterationst setup (s)t solve (s)t total (s) Regular FF FF

26

Copper Mountain 2006Modified FF Interpolation27 Results: 3D elliptic PDE with jumps in a (au x ) x + (au y ) y + (au z ) z = 1 AMG, 1 processor, dof C op # iterationst setup (s)t solve (s)t total (s) Regular2.44>> 200Slow toconverge FF FF AMG-GMRES(5), 1 processor, 80 3 dof C op # iterationst setup (s)t solve (s)t total (s) Regular FF FF

Copper Mountain 2006Modified FF Interpolation28 Modified FF Interpolation (FF1) Summary PMIS with FF or FF1 is scalable! (comparable to RS with classical interpolation, but lower complexity; no need for GMRES) FF1 reduces setup time compared to FF FF1 iteration number kept low as with FF FF1 operator complexity reduced compared to FF PMIS with FF1 seems a good parallel alternative for RS with a second pass and classical interpolation

Copper Mountain Thanks… Ulrike Meier Yang and the Center for Applied Scientific Computing, Lawrence Livermore National Laboratory travel support University of Waterloo Graduate Studies

Copper Mountain 2006Introduction30 Algebraic Multigrid (AMG) Multiple levels (coarsen to some minimum) Iterative AMG suitable for unstructured grids

Copper Mountain 2006Introduction31 PMIS Coarsening weighted independent set algorithm: –points, i, that influence many equations ( i large) are good candidates for C-points add random number between 0 and 1 to i to break ties pick C-points with maximal measures (like in CLJP), then make all their neighbors fine points (like in RS) proceed until all points have been assigned as fine or coarse

Copper Mountain Conclusions PMIS vs. PMIS Greedy coarsening –PMIS Greedy does not provide a considerable advantage –any advantage will be degraded in a parallel implementation Restricting PMIS to finer grid levels –considerable improvement over uniform RS and PMIS coarsening possible –scalability possible –trade-off between memory use and convergence speed FF1 Interpolation (with PMIS coarsening) –comparable performance to FF (both better than regular interpolation) –scalability possible –better suited to parallel implementation than FF