Presentation is loading. Please wait.

Presentation is loading. Please wait.

A CONDENSATION-BASED LOW COMMUNICATION LINEAR SYSTEMS SOLVER UTILIZING CRAMER'S RULE Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer.

Similar presentations


Presentation on theme: "A CONDENSATION-BASED LOW COMMUNICATION LINEAR SYSTEMS SOLVER UTILIZING CRAMER'S RULE Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer."— Presentation transcript:

1 A CONDENSATION-BASED LOW COMMUNICATION LINEAR SYSTEMS SOLVER UTILIZING CRAMER'S RULE Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science The University of Tennessee GABRIEL CRAMER (1704-1752)

2 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Outline 2  Motivation & problem statement  Algorithm review  Numerical accuracy & stability  Parallel Implementation  Communication Results Source: http://tridane.faculty.asu.edu

3 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Introduction  Mainstream approach: Gaussian Elimination  e.g. LU decomposition  Looking for a lower communication overhead, efficient parallel solver  Targeting an unpopular approach: Cramer’s Rule 3

4 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu LU Communication Pattern Source: http://www.caam.rice.edu/~timwar/MA471F03/  Communication for distributed LU decomposition L00 U00 U01U02 L10A11A12 L20A21A22  Three sequential steps 1. Top left computes and sends 2. Row and column leads compute and send 3. Remaining processors factorize their blocks  One-to-one communication  Idle time while leads processing 4

5 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Outline 5  Motivation & problem statement  Algorithm review  Numerical accuracy & stability  Parallel Implementation  Communication Results Source: http://tridane.faculty.asu.edu

6 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Proposed Algorithm Flow 6

7 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Matrix “Mirroring”  Mirroring example  Applying Chio’s condensation yields: 7

8 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Outline 8  Motivation & problem statement  Algorithm review  Numerical accuracy & stability  Parallel Implementation  Communication Results Source: http://tridane.faculty.asu.edu

9 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Accuracy and Numerical Stability  Backward error estimation  Theoretical estimate of rounding error  E matrix depends on two items The largest element in A or b The growth factor of the algorithm  Same growth factor as LU-decomposition with partial pivoting 9

10 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Forward Error Comparisons Matrix Size κ(A) Max MatlabMax GSL Avg Matlab Avg GSL 1000 x 10005069302.39E-091.93E-101.03E-105.38E-12 2000 x 20007903454.52E-095.36E-091.01E-107.27E-12 3000 x 300015401521.95E-081.84E-081.12E-102.09E-11 4000 x 4000127605994.81E-085.62E-081.43E-107.91E-11 5000 x 50007657862.92E-084.39E-081.18E-103.46E-11 6000 x 600014994308.67E-088.70E-081.37E-106.04E-11 7000 x 700034880109.92E-088.95E-081.27E-105.15E-11 8000 x 800081540209.09E-089.43E-081.86E-107.85E-11 10

11 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Forward Error - Residual Matrix Sizeκ(A) Max Residual Avg Residual 1000 x 10005069303.14E-084.46E-09 2000 x 20007903456.72E-099.48E-10 3000 x 300015401522.79E-083.28E-09 4000 x 4000127605991.06E-051.34E-06 5000 x 50007657862.00E-082.65E-09 6000 x 600014994302.95E-083.86E-09 7000 x 700034880101.99E-082.44E-09 8000 x 800081540201.94E-082.32E-09 11

12 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu MATLAB Matrix Gallery Special Matrix Avg MatlabResidual Matlab Residual clement — Tridiagonal matrix with zero diagonal entries1.40E-057.43E+1337.85E+144 lehmer — Symmetric positive definite matrix2.49E-067.20E-093.89E-06 circul — Circulant matrix3.23E-081.53E-131.04E-09 chebspec — Chebyshev spectral differentiation matrix9.12E-023.74E+042.0E-01 lesp — Tridiagonal matrix with real, sensitive eigenvalues9.56E-115.11E-167.30E-10 minij — Symmetric positive definite matrix5.14E-101.71E-086.59E-06 orthog — Orthogonal and nearly orthogonal matrices1.03E-071.09E-142.80E-08 randjorth — Random J-orthogonal matrix1.55E-041.68E-001.13E-04 12

13 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Outline 13  Motivation & problem statement  Algorithm review  Numerical accuracy & stability  Parallel Implementation  Communication Results Source: http://tridane.faculty.asu.edu

14 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Serial Performance Results support the theoretical ~2.5x complexity ratio 14

15 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Algorithm Processing Flow 15

16 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Overview of Parallel Implementation 16

17 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Parallel Implementation (cont’) 17

18 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu  Two phases of parallel communication  Parallel Chio’s  Gather Columns  Overall Bandwidth Communication Complexity N: Original matrix size, P: number of processors, F: gather columns size 18

19 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Communication Overhead 19

20 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu  Point at which Communication “dead time” matches computational workload Where’s the Breakeven Point?  Assuming d C =.05 and N = 1000, the breakeven processors point would be P ~142 20

21 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Closing Thoughts …  Proposed O(N 3 ) Cramer’s Rule method  Significantly lower communications overhead  Many more “broadcasts” than “unicasts”  Comm. function of problem size not processors  Next steps …  Optimize parallel implementation  Spare matrix version 21

22 EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu Thank you 22


Download ppt "A CONDENSATION-BASED LOW COMMUNICATION LINEAR SYSTEMS SOLVER UTILIZING CRAMER'S RULE Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer."

Similar presentations


Ads by Google