MultiGrid
Project Goal Parallelize and make Multigrid faster than before (with similar level of accuracy) SharedArray, 4 procs
Motivation – Gauss Seidel Solving PDE with Finite Difference discretization Gauss-Seidel iteration – uses nearby points to update Solving Ax = b, using It kills high frequency errors really fast, but really slow afterwards
1D Error FAST (killing high-freq error) SLOW (killing low-freq error)
Multigrid Hierarchy of Grids
Multigrid – Going Down (To coarsest grid) Sending Residual(Estimated error) recursively to coarser grid Residual r = b – Ax = Au - Ax, where u is true solution for Au = b Solve Residual equation, Obtain Error correction term solve Ax’ = r and then u = x+x’ (x’ corrects x when going up) Restriction function
Multigrid – Going Up (To original grid) At bottom level(coarsest grid), we could solve the residual equation exactly (There is only one point in coarsest grid!!) Sending Error correction term recursively up to finer grids, until we reach initial grid Prolongation(Linear Interpolation)
1D Multigrid - Initialization FAST (killing high-freq error)
1D Error : Points we’re using : Points we’re not using FAST (1 point – exactly solve) Level 1
1D Error : Points we’re using : Points we’re not using FAST (1 point – exactly solve) Level 1
1D Error : Points we’re using : Points we’re not using FAST (high-freq due to center point) Level 2
1D Error : Points we’re using : Points we’re not using FAST (high-freq due to center point) Level 2
1D Error : Points we’re using : Points we’re not using FAST (high-freq errors) Level 3
1D Error : Points we’re using : Points we’re not using FAST (high-freq errors) Level 3
1D Error : Points we’re using : Points we’re not using FAST (high-freq errors) Level 4 (Original)
1D Error : Points we’re using : Points we’re not using FAST (high-freq errors) Level 4 (Original)
Multigrid O(N) computational cost, N = number of grid points Known as one of the fastest solvers Multigrid.jl, AMG.jl, existing julia packages for Multigrid
Parallelizing 1 - Gauss-Seidel WORKER 1 WORKER 2 WORKER 3 WORKER 4
Parallelizing 1 - Gauss-Seidel WORKER 1 WORKER 2 WORKER 3 WORKER 4
Parallelizing 1 - Gauss-Seidel function GS_mix!
Parallelizing 1 - Gauss-Seidel GS gets almost 2 times faster than before Gets faster and faster with large N Even more accurate!
Parallelizing 2 – Restriction/Prolongation Parallelizing Restriction and Prolongation functions Restriction stencil : Prolongation : (Linear interpolation)
Parallelizing 2 – Restriction/Prolongation Again, divide domain into 4 regions Sending Residuals/Interpolate separately Parallelized version beats normal version after level 12 grid (~17000000 points)
V-Cycles and W-Cycles
Parallelizing - Around 1.5 times faster Normal V-cycle Level 10 2.4sec Error rate 0.016 3.6sec Error rate 0.026 W-cycle Level 12 192sec Error rate 6.9E-5 306sec Error rate 0.0001 Error rate = norm(error)/norm(solution)