Programming Massively Parallel Graphics Multiprocessors using CUDA Final Project Amirhassan Asgari Kamiabad 996620802.

Slides:

Advertisements

Similar presentations

MATH 685/ CSI 700/ OR 682 Lecture Notes

Advertisements

Solving Linear Systems (Numerical Recipes, Chap 2)

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

SOLVING SYSTEMS OF LINEAR EQUATIONS. Overview A matrix consists of a rectangular array of elements represented by a single symbol (example: [A]). An individual.

Matrices & Systems of Linear Equations

Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.

Solution of linear system of equations

1cs542g-term Notes  Assignment 1 will be out later today (look on the web)

1cs542g-term Notes  Assignment 1 is out (questions?)

Numerical Optimization

Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.

Matrices. Special Matrices Matrix Addition and Subtraction Example.

Sparse Matrix Algorithms CS 524 – High-Performance Computing.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

PETE 603 Lecture Session #29 Thursday, 7/29/ Iterative Solution Methods Older methods, such as PSOR, and LSOR require user supplied iteration.

Chapter 7 Matrix Mathematics Matrix Operations Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

ME964 High Performance Computing for Engineering Applications “The real problem is not whether machines think but whether men do.” B. F. Skinner © Dan.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Scientific Computing Linear Systems – LU Factorization.

Upcrc.illinois.edu OpenMP Lab Introduction. Compiling for OpenMP Open project Properties dialog box Select OpenMP Support from C/C++ -> Language.

ECON 1150 Matrix Operations Special Matrices

Eigenvalue Problems Solving linear systems Ax = b is one part of numerical linear algebra, and involves manipulating the rows of a matrix. The second main.

Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.

Matrix Algebra. Quick Review Quick Review Solutions.

1 Iterative Solution Methods Starts with an initial approximation for the solution vector (x 0 ) At each iteration updates the x vector by using the sytem.

Row 1 Row 2 Row 3 Row m Column 1Column 2Column 3 Column 4.

Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors

CMPS 1371 Introduction to Computing for Engineers MATRICES.

10.4 Matrix Algebra 1.Matrix Notation 2.Sum/Difference of 2 matrices 3.Scalar multiple 4.Product of 2 matrices 5.Identity Matrix 6.Inverse of a matrix.

Large Steps in Cloth Simulation - SIGGRAPH 98 박 강 수박 강 수.

GPU Architecture and Programming

Lecture 8 Matrix Inverse and LU Decomposition

Linear algebra: matrix Eigen-value Problems Eng. Hassan S. Migdadi Part 1.

Solving Linear Systems Solving linear systems Ax = b is one part of numerical linear algebra, and involves manipulating the rows of a matrix. Solving linear.

Lesson 3 CSPP58001.

Solution of Sparse Linear Systems

Parallel Solution of the Poisson Problem Using MPI

Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }

Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x = (2, 1) denotes a vector length = sqrt(2*2+1*1) orientation.

CS/EE 217 GPU Architecture and Parallel Programming Midterm Review

TH EDITION LIAL HORNSBY SCHNEIDER COLLEGE ALGEBRA.

10.4 Matrix Algebra 1.Matrix Notation 2.Sum/Difference of 2 matrices 3.Scalar multiple 4.Product of 2 matrices 5.Identity Matrix 6.Inverse of a matrix.

Linear Systems Dinesh A.

Krylov-Subspace Methods - I Lecture 6 Alessandra Nardi Thanks to Prof. Jacob White, Deepak Ramaswamy, Michal Rewienski, and Karen Veroy.

9 Nov B - Introduction to Scientific Computing1 Sparse Systems and Iterative Methods Paul Heckbert Computer Science Department Carnegie Mellon.

Irregular Applications –Sparse Matrix Vector Multiplication

2 2.5 © 2016 Pearson Education, Ltd. Matrix Algebra MATRIX FACTORIZATIONS.

Matrices. Matrix - a rectangular array of variables or constants in horizontal rows and vertical columns enclosed in brackets. Element - each value in.

Matrices, Vectors, Determinants.

Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory.

College Algebra Chapter 6 Matrices and Determinants and Applications

MTH108 Business Math I Lecture 20.

Parallel Direct Methods for Sparse Linear Systems

Lecture 13 Sparse Matrix-Vector Multiplication and CUDA Libraries

Lecture 13 Sparse Matrix-Vector Multiplication and CUDA Libraries

Linear Equations.

Matrix Operations SpringSemester 2017.

GPU Implementations for Finite Element Methods

CS/EE 217 – GPU Architecture and Parallel Programming

CSCE569 Parallel Computing

CS5321 Numerical Optimization

Solving Linear Systems: Iterative Methods and Sparse Systems

Sec 3.5 Inverses of Matrices

Matrix Operations Ms. Olifer.

Matrix Operations SpringSemester 2017.

3.5 Perform Basic Matrix Operations Algebra II.

Ax = b Methods for Solution of the System of Equations (ReCap):

Presentation transcript:

Programming Massively Parallel Graphics Multiprocessors using CUDA Final Project Amirhassan Asgari Kamiabad

Electric Power System Largest infrastructure made by man Prone to many kinds of disturbance : Blackouts Transient stability analysis: dynamic behavior of power system few seconds following a disturbance (Electric Power Network) (String and Mass Network) Voltage, Current, Power angel…. Force, Speed, Acceleration….

Transient Stability Computation Differential and Algebraic Equations Discretize the differential part and use Backward Euler Newton Method solve nonlinear system Solve Linear System Transient Stability computation Load

Jacobian Matrix Properties Large and Sparse: 10k* 10k with 50k non-zeros Stiff: Eigen values are widespread Save in sparse compressed row format Data, indices and pointer vectors indices data pointer Column

Direct: LU factorization LU factorization : Ax = (L U)x = L(Ux)=b Backward Substitution Ly = b Forward Substitution Ux=y Number of active threads reduce during run time Backward and forward substitution are serial algorithms

Indirect: Conjugate Gradient Iterative method which use only matrix multiplication and addition Necessarily converges in N iteration if the matrix is symmetric positive definite Suffer from slow rate of convergence Transforming system to an equivalent with same solution but easier to solve Preconditioning can improve robustness and efficiency

Chebychev Preconditioner Iterative method which estimate inverse of matrix Main GPU kernels: Sparse Matrix-Matrix Multiply Dense to Sparse transfer

Sparse Matrix-Matrix Multiplication One thread per row: Multiply each element in the row for (int jj = row_start; jj < row_end; jj++) Find corresponding element: while (kk = vec1_indice[kk]) If found, perform the multiplication if (indice[jj] == vec1_indice[kk]) sum += data[jj] * vec1_data[kk]; Indices data pointer Thread Id = 2 :

Sparse Matrix-Matrix Multiplication Code const int row = (blockDim.x * blockIdx.x + threadIdx.x); if(row < num_rows){ float sum = 0; int dense_flag = 0; int row_start = ptr[row]; int row_end = ptr[row+1]; int col_start = vec1_ptr[col_num]; int col_end = vec1_ptr[col_num+1]; for (int jj = row_start; jj < row_end; jj++){ int kk = col_start; int not_found = 1; while( (kk = vec1_indice[kk]) && not_found){ if (indice[jj] == vec1_indice[kk]){ sum += data[jj] * vec1_data[kk]; not_found = 0;} kk++; }

Dense to Sparse Conversion Save the result of multiplication in dense format: If thread holds zero, write zero, otherwise write 1: Perform scan over the pointer vector: ABCDE ABCDE if (dense_ptr[tid] - dense_ptr[tid-1]){ data[dense_ptr[tid] -1 + data_num] = dense_data[tid]; indice[dense_ptr[tid] -1 + data_num] = tid;}

Conjugate Gradient Algorithm Main GPU Kernels: Vector Norm Vector inner product (a,b) Update vectors

Vector Norm and Inner Product Vector norm adds square of all elements and output square root of result: Load elements and multiply by itself Perform scan and compute square root Inner product multiply corresponding elements of two vector and add up the results: Load both vectors and multiply corresponding elements Perform scan and report the results

Kernels Optimization: Optimizations on Matrix-Matrix vector multiply Load both matrices in sparse Copy vector in shared memory Optimized search Implement add in same kernel to avoid global memory Challenges: Link lists and sparse formats Varying matrix size in each kernel

Kernels Evaluation: Main kernels Execution Time

Preconditioner Efficiency Methodology: 1-Effect on Eigen values spectrum 2- Effect on number of conjugate gradient iterations Preconditioning Effect on IEEE-57 Test System

CG Efficiancy Matrix NameMatrix SizeTotal Nonzeros Condition Number Total Time Per Iteration (ms) E05R e E20R e E30R e Matrices Structural Plot(source : Matrix Market)

Discussion Challenges, problems and ideas Work with all matrices: matrix market form New kernel ideas SpMMul Dens2Sp Future work and updates Other iterative methods: GMRES, BiCG