Partial Differential Equations and Iterative Solvers

Partial Differential Equations and Iterative Solvers
Spring Semester 2005 Geoffrey Fox Community Grids Laboratory Indiana University 505 N Morton Suite 224 Bloomington IN 11/14/201811/14/2018 jsuparallelpdesolution05

Abstract of Partial Differential Equations and Iterative Solvers
This Introduces the three fundamental types of PDE's -- Elliptic, Parabolic and Hyperbolic and studies the numerical solution of Elliptic Equations The sparse matrix formulation is used and iterative approaches -- Jacobi, Gauss Seidel and SOR are defined Parallel Computing is Discussed for Gauss Seidel We discuss Multigrid methods at a simple level 11/14/201811/14/2018 jsuparallelpdesolution05

jsuparallelpdesolution05 gcf@indiana.edu
Why Study PDE’s ? Most things are described at a fundamental level by interactions between particles Quarks, nucleons, mesons in nuclear physics Atoms, molecules, electrons in classical physics But as we discussed earlier this is not optimal way to think if there are a lot of particles such as per gram mole (per gram for Hydrogen) We must average! PDE’s describe Physical systems which behave as continuous systems at a macroscopic level, in a fluid fashion Examples: airflow over an airplane wing radar waves reflecting from airport traffic blood circulation in the human body simulating global climate to predict changes in the ozone layer 11/14/201811/14/2018 jsuparallelpdesolution05

What is a PDE? A partial differential equation is “partial” as it involves quantities –”fields” – differentiated with respect to more than one quantity “Fields” are functions of position (x,y,z) and time t The best studied PDE’s involve differentiations wrt x y z and time such as wave equation  2 / t2 = -c2  2 / x2 in one dimension or  2 / t2 = -c2 ( 2 / x2 +  2 / y2 +  2 / z2) in three dimensions Note  2 =  2 / x2 +  2 / y2 +  2 / z2 11/14/201811/14/2018 jsuparallelpdesolution05

Boundary Conditions I Cauchy Boundary Conditions are  and  / n given on a curve C where n is normal direction to curve. This is appropriate for hyperbolic equations Region determined by given boundary conditions C 11/14/201811/14/2018 jsuparallelpdesolution05

Boundary Conditions II
 and  / n cannot be given on a closed curve as then solutions say from gotten from integrating from XZ1Y up and XZ2Y down will clash Dirichlet Boundary Conditions:  given on closed curve C Neumann Boundary Conditions:  / n given on closed curve C C X Y Z1 Z2 11/14/201811/14/2018 jsuparallelpdesolution05

11/14/201811/14/2018 jsuparallelpdesolution05

Solving Laplace’s or Poisson’s Equation
As always, we must convert continuous equations to a discrete form by setting up a mesh of points – this is finite difference method h is step size in picture Nx grid points in x ; Ny in y direction illustrated by Nx = Ny = 14 11/14/201811/14/2018 jsuparallelpdesolution05

Potential in a Vacuum Filled Rectangular Box
So imagine the world’s simplest problem Find the electrostatic potential inside a box whose sides are at a given potential Set up a 16 by 16 Grid on which potential defined and which must satisfy Laplace’s Equation  2 / x2 +  2 / y2 = 0 11/14/201811/14/2018 jsuparallelpdesolution05

Basic Numerical Algorithm
Up  middle Using standard “central differencing” techniques, one can approximate  ight Left Down  2  = ( Left +  Right +  Up +  Down – 4  Middle ) / h2 11/14/201811/14/2018 jsuparallelpdesolution05

Setup for simple 16 by 16 Grid
14 by 14 Internal Grid with typical local operator characteristic of differentials 11/14/201811/14/2018 jsuparallelpdesolution05

Iterative Methods for Solving Sparse Matrices
There are many iterative methods which can be applied to solve any matrix equation but are particularly effective in sparse matrices as they directly exploit “zero structure” Here we look at three simplest or stationary methods - so called because iteration equation is the same at each iteration The Jacobi method is based on solving for every variable locally with respect to the other variables; one iteration of the method corresponds to solving for every variable once. The resulting method is easy to understand and implement, but convergence is slow. The Gauss-Seidel method is like the Jacobi method, except that it uses updated values as soon as they are available. In general, it will converge faster than the Jacobi method, though still relatively slowly. Successive Overrelaxation (SOR) can be derived from the Gauss-Seidel method by introducing an extrapolation parameter . For the optimal choice of , SOR converges faster than Gauss-Seidel by an order of magnitude. 11/14/201811/14/2018 jsuparallelpdesolution05

Formalism for Iterative Methods
Generic Iteration Strategy is to split matrix A = M-N so that it is much easier to invert M than A and in particular that inverting M can exploit zero structure of A Ax = b implies Mx = Nx + b So write Mxk = Nx(k-1) + b (*) All iteration methods have this form for different choices of M, N for given A and b We must ensure that Iteration (*) converges and we naturally prefer that it converges fast! 11/14/201811/14/2018 jsuparallelpdesolution05

Convergence of Jacobi in One Dimension
One can analyze the convergence of iterative methods and show different patterns of discrepancy (current guess-exact) lead to different convergence properties if one has N grid points per dimension, convergence governed by eigenvalue: Thus if one requires an error of 10-6, then the number of iterations is given by a dreadfully large number K where   11/14/201811/14/2018 jsuparallelpdesolution05

What is Easy/Hard for Jacobi?
Slowest Convergence is for a discrepancy (residual) that varies slowly with grid point. Residual Grid Position 11/14/201811/14/2018 jsuparallelpdesolution05

Information Moves Slowly
For this smooth discrepancy to be “noticed” by iterative method, the points along way away need to affect each other. This will take N steps to happen as information moves slowly (one grid point per iteration) in Jacobi Then after impact, one needs to actually correct value Thus convergence takes N2 iterations On the other hand, if residual varies rapidly with position, that will be corrected quickly as only needs a few iterations for information to be communicated Thus rapidly varying residual corresponds to lowest eigenvalue 11/14/201811/14/2018 jsuparallelpdesolution05

This type of error converges fastest
Residual Grid Position This type of error converges fastest 11/14/201811/14/2018 jsuparallelpdesolution05

Direct Solution Method for Ax=b
Consider Ax = b solved "directly" which is Gaussian elimination where one successively removes x1 from Equation 2 to n, x2 from Equations 3 to n etc. Then this has memory requirement of order n2 and computational complexity of order n3 This is modified somewhat when you consider matrices A with a lot of zeroes and try hard to exploit these zeros i.e. avoid doing calculations which are irrelevant as adding or multiplying by zero Of course this cannot be done by testing on matrix element being zero as modern computers do floating point arithmetic faster than or at least as fast as test! Rather one arranges loops to not include zero elements -- if possible! 11/14/201811/14/2018 jsuparallelpdesolution05

From Geoffrey Hello – this is sad I sent everybody with HW4 grades And link to my answers Some were having Tomcat problems – me if this is still a problem 11/14/201811/14/2018 jsuparallelpdesolution05

n B n3 nB2 11/14/201811/14/2018 jsuparallelpdesolution05

Gauss Seidel Structure
A x = b A = D - L - U where D diagonal, L lower triangular and U upper triangular and is HARD to invert Jacobi is D x(k) = b + L x(k-1) + U x(k-1) Gauss Seidel is (D-L) x(k) = b + U x(k-1) Where it is trivial to to invert lower triangular matrices D-L 11/14/201811/14/2018 jsuparallelpdesolution05

What can we do for Parallel Gauss Seidel?
k-1 k-1 k k Must Exploit special structure of matrix Gauss Seidel for “general matrices” cannot be parallelized 11/14/201811/14/2018 jsuparallelpdesolution05

16 by 16 Wavefront Parallel Gauss Seidel
This is best approach to parallelizing Gauss Seidel for the natural ordering along rows -- first in x, the y (and then z in 3D) 4 Processors with cyclic row decomposition Processor 1 rows Processor 4 rows etc. Consider points on lines parallel to diagonal 31 such lines Execute each phase consecutively starting at bottom left All points in a given phase can be updated in parallel as one only needs updated (iteration k) points from previous phase -- need iteration (k-1) from next phase Load Imbalance but at worst processors differ by 1 in load and effect goes away for large systems This example has speed up of 3.4 on 4 processors 8 phases 1 unit, 8 phases 2 units, 8 phases 3 units, 7 phases 4 units of work 11/14/201811/14/2018 jsuparallelpdesolution05

Wavefront Suppose conventional Gauss Seidel ordering of update --
31 Wavefront Suppose conventional Gauss Seidel ordering of update -- along rows starting at bottom left All points needed by general point P at level k are below dotted line parallel to diagonal and through P Iteration Number k or k-1 11/14/201811/14/2018 3 5 Phase Number runs 1 to 31 jsuparallelpdesolution05

Red Black Parallel Gauss Seidel I
The Pipeline method has high communication costs (as will in fact use “block cyclic” to preserve locality) and is complex to implement well Thus instead we note that we can get new versions of Gauss Seidel by reordering update order -- this could (in principle) make for a better or worse method (or more likely we won’t know if better or worse!) There is a natural reordering which is typically as good if not better for which parallelism is “trivial” This ONLY works for nearest neighbor stencil but there are versions of red black for similar stencils 11/14/201811/14/2018 jsuparallelpdesolution05

Red Black K = (G2/2) + 1 where grid is G by G First 4 and Middle 4 labeling of Gauss Seidel Update Order shown 11/14/201811/14/2018 jsuparallelpdesolution05

Red Black Parallel Gauss Seidel II
In red-black, you color every alternate point red and every other point black This gives a checkerboard pattern shown on previous foil Now label update so first update all red points, then update all black points Updating red points to iterate k only requires black points at level k-1 Updating black points to level k requires NO black points but just read points at level k So can divide parallel update into two phases Phase I: Update all Red Points Phase II: Update all Black Points 11/14/201811/14/2018 jsuparallelpdesolution05

Parallel Red Black 1 2 3 4 11/14/201811/14/2018 K K+1 K+2 K+3 jsuparallelpdesolution05

Red Black Parallel Gauss Seidel III
Do normal block decomposition Parallel Phase I: Update all Red Points Communicate black points at k-1 to halo in each processor Compute red points to iterate k Parallel Phase II: Update all Black Points Communicate red points at k to halo in each processor Compute black points to iterate k This has similar efficiency analysis to Jacobi except a little more sensitive to latency Same amount of communication but twice as many messages In electrical power system and similar simulations, one gets irregular sparse matrices and no way to get such a clean parallel algorithm In fact not clear if there is a sequential Gauss Seidel that converges 11/14/201811/14/2018 jsuparallelpdesolution05

Red Black Parallel Gauss Seidel IV
Fortunately very irregular problems like power systems tend not to be huge. 100’s or 10,000’s not billions of points Thus can use very difficult optimized “full matrix exploiting zero structure” algorithms In PDE case, one can show for some elliptic problems that red black can be better than original ordering For some hyperbolic equations used in computational fluid dynamics, this is not so clear and SOR methods are used If one has more complex differencing schemes e.g. fourth order differencing, then red-black does not work but a scheme with more colors (and more update phases) can be devised Difficulty is irregular graphs not complex stencils 11/14/201811/14/2018 jsuparallelpdesolution05

Successive Over Relaxation I
Jacobi and Gauss Seidel give a formula for xk in terms of x(k-1) call this xbasick Overrelaxation forms xSORk =  xbasick + (1- )xbasic(k-1) Typically only 0 <  < 2 is sensible and  < 1 is relaxation 1 <  < 2 is over relaxation It is “over” because if  > 1, you go “further” in direction of new estimate than calculated In “relaxation”, you conservatively, average prediction with old value Unfortunately, the best value of  cannot be calculated except in simple cases and so SOR is not so reliable for general problems 11/14/201811/14/2018 jsuparallelpdesolution05

Successive Over Relaxation II
You can show that for Jacobi, best choice is  = 1 i.e. that relaxation strategy is not very useful For Gauss-Seidel, then in simple one dimensional case with N grid points, one can show that there is an optimal choice  = 2 ( 1 -  / N) i.e. almost equals 2 Then the number of iterations needed by optimal SOR to get a given error can be shown to be proportional to N not N2 as in Jacobi or Gauss Seidel. So Ratio of Iterations is: Jacobi N2/ 2 Gauss-Seidel N2/ 2 SOR on Gauss-Seidel N/  11/14/201811/14/2018 jsuparallelpdesolution05

Preconditioning One can change A and b by preconditioning M1-1 A (M2-1 M2)x = M1-1 b is same equation as before for any choice of matrices M1 and M2 All these choices are designed to accelerate convergence of iterative methods Anew = M1-1 A M2-1 xnew = M2 x bnew = M1-1 b We choose M1 and M2 to make our standard methods perform better There are specialized preconditioning ideas and perhaps better general approaches such as multigrid and Incomplete LU (ILU) decomposition Anew xnew = bnew has same form as above and we can apply any of the methods that we used on A x = b 11/14/201811/14/2018 jsuparallelpdesolution05

Multigrid Methods We remarked that key problem with iterative methods is that got detail (short wavelength) correct but that convergence was controlled by coarse (long wavelength) structure Then in simple methods one needs of order N2 iterations to get good results Ironically one goes to large N to get detail as if all you wanted was coarse structure, a smaller mesh would be fine Basic idea in multigrid is key in many areas of science Solve a problem at multiple scales We get coarse structure from small N and fine detail from large N Good qualitative idea but how do we implement? Some material taken from Tutorial by Ulrich Ruede 11/14/201811/14/2018 jsuparallelpdesolution05

Gauss Seidel is Slow I Take Laplace’s equation in the Unit Square with initial guess as  = 0 and boundary conditions that are zero except on one side For N=31 by 31 Grid it takes around 1000 (N2) iterations to get a reasonable answer Boundary Conditions Exact Solution 11/14/201811/14/2018 jsuparallelpdesolution05

Gauss Seidel is Slow II 1 Iteration 10 Iterations 1000 Iterations 100 Iterations 11/14/201811/14/2018 jsuparallelpdesolution05

Multigrid Philosophically
Suppose we have a finest level M(1) with N by N points (in 2D) Then the k’th coarsest approximation M(k) to this has N/2k by N/2k points One way to think about Multigrid is that M(k+1) can form a preconditioner to M(k) so that one can replace natural matrix A(k) by A-1(k+1)A(k) A-1(k+1)A(k) is a nicer matrix than A(k) and iterative solvers converge faster as long wavelength terms have been removed Basic issue is that A(k) and A(k+1) are of different size so we need prolongation and restriction to map solutions at level k and k+1 We apply this idea recursively 11/14/201811/14/2018 jsuparallelpdesolution05

Multigrid Hierarchy Relax Restrict Interpolate Relax Relax Relax Relax 11/14/201811/14/2018 jsuparallelpdesolution05

Basic Multigrid Ideas In picture, relax is application of standard iteration scheme to “solve” short wavelength solution at a given level i.e. use Jacobi, Gauss-Seidel, Conjugate Gradient Interpolation is taking a solution at a coarser grid and interpolating to find a solution at half the grid size Restriction is taking solution at given grid and averaging to find solution at coarser grid k Restrict Interpolate k+1 11/14/201811/14/2018 jsuparallelpdesolution05

Multigrid Algorithm: procedure MG(level, A, u, f)
if level = coarsest then solve coarsest grid equation by another method “exactly” else smooth Alevel u = f (m1 times) Compute residual r = f - Alevel u Restrict F = R r ( R is Restriction Operator) Call MG( level + 1, A(level+1), V, F) (mc times) Interpolate v = P V (Interpolate new solution at this level) correct unew = u + v smooth Aunew = f (m2 times) and set u = unew endif endprocedure Alevel uexact = f uexact = u + v Alevel v = r = f - Alevel u 11/14/201811/14/2018 jsuparallelpdesolution05

Multigrid Cycles The parameter mc determines the exact strategy for how one iterates through different meshes One ends up with a full cycle as shown V Cycle W Cycle Finest Grid Coarsest Grid Full Multigrid Cycle Finest Coarsest 11/14/201811/14/2018 jsuparallelpdesolution05

Partial Differential Equations and Iterative Solvers

Similar presentations

Presentation on theme: "Partial Differential Equations and Iterative Solvers"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Partial Differential Equations and Iterative Solvers

Similar presentations

Presentation on theme: "Partial Differential Equations and Iterative Solvers"— Presentation transcript:

Similar presentations

About project

Feedback