Programming Massively Parallel Graphics Multiprocessors using CUDA Final Project Amirhassan Asgari Kamiabad 996620802.

Slides:



Advertisements
Similar presentations
MATH 685/ CSI 700/ OR 682 Lecture Notes
Advertisements

Solving Linear Systems (Numerical Recipes, Chap 2)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
SOLVING SYSTEMS OF LINEAR EQUATIONS. Overview A matrix consists of a rectangular array of elements represented by a single symbol (example: [A]). An individual.
Matrices & Systems of Linear Equations
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
Solution of linear system of equations
1cs542g-term Notes  Assignment 1 will be out later today (look on the web)
1cs542g-term Notes  Assignment 1 is out (questions?)
Numerical Optimization
Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.
Matrices. Special Matrices Matrix Addition and Subtraction Example.
Sparse Matrix Algorithms CS 524 – High-Performance Computing.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
PETE 603 Lecture Session #29 Thursday, 7/29/ Iterative Solution Methods Older methods, such as PSOR, and LSOR require user supplied iteration.
Chapter 7 Matrix Mathematics Matrix Operations Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
ME964 High Performance Computing for Engineering Applications “The real problem is not whether machines think but whether men do.” B. F. Skinner © Dan.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Scientific Computing Linear Systems – LU Factorization.
Upcrc.illinois.edu OpenMP Lab Introduction. Compiling for OpenMP Open project Properties dialog box Select OpenMP Support from C/C++ -> Language.
ECON 1150 Matrix Operations Special Matrices
Eigenvalue Problems Solving linear systems Ax = b is one part of numerical linear algebra, and involves manipulating the rows of a matrix. The second main.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Matrix Algebra. Quick Review Quick Review Solutions.
1 Iterative Solution Methods Starts with an initial approximation for the solution vector (x 0 ) At each iteration updates the x vector by using the sytem.
Row 1 Row 2 Row 3 Row m Column 1Column 2Column 3 Column 4.
Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors
CMPS 1371 Introduction to Computing for Engineers MATRICES.
10.4 Matrix Algebra 1.Matrix Notation 2.Sum/Difference of 2 matrices 3.Scalar multiple 4.Product of 2 matrices 5.Identity Matrix 6.Inverse of a matrix.
Large Steps in Cloth Simulation - SIGGRAPH 98 박 강 수박 강 수.
GPU Architecture and Programming
Lecture 8 Matrix Inverse and LU Decomposition
Linear algebra: matrix Eigen-value Problems Eng. Hassan S. Migdadi Part 1.
Solving Linear Systems Solving linear systems Ax = b is one part of numerical linear algebra, and involves manipulating the rows of a matrix. Solving linear.
Lesson 3 CSPP58001.
Solution of Sparse Linear Systems
Parallel Solution of the Poisson Problem Using MPI
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x = (2, 1) denotes a vector length = sqrt(2*2+1*1) orientation.
CS/EE 217 GPU Architecture and Parallel Programming Midterm Review
TH EDITION LIAL HORNSBY SCHNEIDER COLLEGE ALGEBRA.
10.4 Matrix Algebra 1.Matrix Notation 2.Sum/Difference of 2 matrices 3.Scalar multiple 4.Product of 2 matrices 5.Identity Matrix 6.Inverse of a matrix.
Linear Systems Dinesh A.
Krylov-Subspace Methods - I Lecture 6 Alessandra Nardi Thanks to Prof. Jacob White, Deepak Ramaswamy, Michal Rewienski, and Karen Veroy.
9 Nov B - Introduction to Scientific Computing1 Sparse Systems and Iterative Methods Paul Heckbert Computer Science Department Carnegie Mellon.
Irregular Applications –Sparse Matrix Vector Multiplication
2 2.5 © 2016 Pearson Education, Ltd. Matrix Algebra MATRIX FACTORIZATIONS.
Matrices. Matrix - a rectangular array of variables or constants in horizontal rows and vertical columns enclosed in brackets. Element - each value in.
Matrices, Vectors, Determinants.
Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory.
College Algebra Chapter 6 Matrices and Determinants and Applications
MTH108 Business Math I Lecture 20.
Parallel Direct Methods for Sparse Linear Systems
Lecture 13 Sparse Matrix-Vector Multiplication and CUDA Libraries
Lecture 13 Sparse Matrix-Vector Multiplication and CUDA Libraries
Linear Equations.
Matrix Operations SpringSemester 2017.
GPU Implementations for Finite Element Methods
CS/EE 217 – GPU Architecture and Parallel Programming
CSCE569 Parallel Computing
CS5321 Numerical Optimization
Matrices.
Solving Linear Systems: Iterative Methods and Sparse Systems
Sec 3.5 Inverses of Matrices
1.8 Matrices.
Matrix Operations Ms. Olifer.
Matrix Operations SpringSemester 2017.
1.8 Matrices.
3.5 Perform Basic Matrix Operations Algebra II.
Ax = b Methods for Solution of the System of Equations (ReCap):
Presentation transcript:

Programming Massively Parallel Graphics Multiprocessors using CUDA Final Project Amirhassan Asgari Kamiabad

Electric Power System Largest infrastructure made by man Prone to many kinds of disturbance : Blackouts Transient stability analysis: dynamic behavior of power system few seconds following a disturbance (Electric Power Network) (String and Mass Network) Voltage, Current, Power angel…. Force, Speed, Acceleration….

Transient Stability Computation Differential and Algebraic Equations Discretize the differential part and use Backward Euler Newton Method solve nonlinear system Solve Linear System Transient Stability computation Load

Jacobian Matrix Properties Large and Sparse: 10k* 10k with 50k non-zeros Stiff: Eigen values are widespread Save in sparse compressed row format Data, indices and pointer vectors indices data pointer Column

Direct: LU factorization LU factorization : Ax = (L U)x = L(Ux)=b Backward Substitution Ly = b Forward Substitution Ux=y Number of active threads reduce during run time Backward and forward substitution are serial algorithms

Indirect: Conjugate Gradient Iterative method which use only matrix multiplication and addition Necessarily converges in N iteration if the matrix is symmetric positive definite Suffer from slow rate of convergence Transforming system to an equivalent with same solution but easier to solve Preconditioning can improve robustness and efficiency

Chebychev Preconditioner Iterative method which estimate inverse of matrix Main GPU kernels: Sparse Matrix-Matrix Multiply Dense to Sparse transfer

Sparse Matrix-Matrix Multiplication One thread per row: Multiply each element in the row for (int jj = row_start; jj < row_end; jj++) Find corresponding element: while (kk = vec1_indice[kk]) If found, perform the multiplication if (indice[jj] == vec1_indice[kk]) sum += data[jj] * vec1_data[kk]; Indices data pointer Thread Id = 2 :

Sparse Matrix-Matrix Multiplication Code const int row = (blockDim.x * blockIdx.x + threadIdx.x); if(row < num_rows){ float sum = 0; int dense_flag = 0; int row_start = ptr[row]; int row_end = ptr[row+1]; int col_start = vec1_ptr[col_num]; int col_end = vec1_ptr[col_num+1]; for (int jj = row_start; jj < row_end; jj++){ int kk = col_start; int not_found = 1; while( (kk = vec1_indice[kk]) && not_found){ if (indice[jj] == vec1_indice[kk]){ sum += data[jj] * vec1_data[kk]; not_found = 0;} kk++; }

Dense to Sparse Conversion Save the result of multiplication in dense format: If thread holds zero, write zero, otherwise write 1: Perform scan over the pointer vector: ABCDE ABCDE if (dense_ptr[tid] - dense_ptr[tid-1]){ data[dense_ptr[tid] -1 + data_num] = dense_data[tid]; indice[dense_ptr[tid] -1 + data_num] = tid;}

Conjugate Gradient Algorithm Main GPU Kernels: Vector Norm Vector inner product (a,b) Update vectors

Vector Norm and Inner Product Vector norm adds square of all elements and output square root of result: Load elements and multiply by itself Perform scan and compute square root Inner product multiply corresponding elements of two vector and add up the results: Load both vectors and multiply corresponding elements Perform scan and report the results

Kernels Optimization: Optimizations on Matrix-Matrix vector multiply Load both matrices in sparse Copy vector in shared memory Optimized search Implement add in same kernel to avoid global memory Challenges: Link lists and sparse formats Varying matrix size in each kernel

Kernels Evaluation: Main kernels Execution Time

Preconditioner Efficiency Methodology: 1-Effect on Eigen values spectrum 2- Effect on number of conjugate gradient iterations Preconditioning Effect on IEEE-57 Test System

CG Efficiancy Matrix NameMatrix SizeTotal Nonzeros Condition Number Total Time Per Iteration (ms) E05R e E20R e E30R e Matrices Structural Plot(source : Matrix Market)

Discussion Challenges, problems and ideas Work with all matrices: matrix market form New kernel ideas SpMMul Dens2Sp Future work and updates Other iterative methods: GMRES, BiCG