Numerical Algorithms Chapter 11.

Slides:



Advertisements
Similar presentations
Numerical Solution of Linear Equations
Advertisements

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 25.
Dense Matrix Algorithms. Topic Overview Matrix-Vector Multiplication Matrix-Matrix Multiplication Solving a System of Linear Equations.
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
Parallel Matrix Operations using MPI CPS 5401 Fall 2014 Shirley Moore, Instructor November 3,
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.
1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).
CS 484. Dense Matrix Algorithms There are two types of Matrices Dense (Full) Sparse We will consider matrices that are Dense Square.
Numerical Algorithms ITCS 4/5145 Parallel Computing UNC-Charlotte, B. Wilkinson, 2009.
1 Parallel Algorithms II Topics: matrix and graph algorithms.
Numerical Algorithms Matrix multiplication
Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers Chapter 11: Numerical Algorithms Sec 11.2: Implementing.
Lesson 8 Gauss Jordan Elimination
Numerical Algorithms • Matrix multiplication
1 Friday, October 20, 2006 “Work expands to fill the time available for its completion.” -Parkinson’s 1st Law.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSE621/JKim Lec4.1 9/20/99 CSE621 Parallel Algorithms Lecture 4 Matrix Operation September 20, 1999.
Design of parallel algorithms
CS 584. Dense Matrix Algorithms There are two types of Matrices Dense (Full) Sparse We will consider matrices that are Dense Square.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Design of parallel algorithms Matrix operations J. Porras.
Dense Matrix Algorithms CS 524 – High-Performance Computing.
1 Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as a i,j and elements of.
Arithmetic Operations on Matrices. 1. Definition of Matrix 2. Column, Row and Square Matrix 3. Addition and Subtraction of Matrices 4. Multiplying Row.
1.2 Gaussian Elimination.
Assignment Solving System of Linear Equations Using MPI Phạm Trần Vũ.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Slide Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley.
8.1 Matrices & Systems of Equations
1 C ollege A lgebra Systems and Matrices (Chapter5) 1.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Numerical Algorithms.Matrix Multiplication.Gaussian Elimination.Jacobi Iteration.Gauss-Seidel Relaxation.
STROUD Worked examples and exercises are in the text Programme 5: Matrices MATRICES PROGRAMME 5.
PARALLEL COMPUTATION FOR MATRIX MULTIPLICATION Presented By:Dima Ayash Kelwin Payares Tala Najem.
STROUD Worked examples and exercises are in the text PROGRAMME 5 MATRICES.
MTH108 Business Math I Lecture 20.
CSE 504 Discrete Mathematics & Foundations of Computer Science
Properties and Applications of Matrices
ECE 3301 General Electrical Engineering
MATHEMATICS Matrix Algebra
5 Systems of Linear Equations and Matrices
Introduction to parallel algorithms
Chapter 2 Determinants by Cofactor Expansion
Pipelined Computations
Parallel Matrix Multiplication and other Full Matrix Algorithms
Lecture 22: Parallel Algorithms
Chapter 1 Systems of Linear Equations and Matrices
Parallel Matrix Operations
Numerical Algorithms • Parallelizing matrix multiplication
CSCE569 Parallel Computing
Pipelined Computations
Parallel Matrix Multiplication and other Full Matrix Algorithms
Parallel Programming in C with MPI and OpenMP
Introduction to parallel algorithms
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.
Numerical Algorithms Quiz questions
Elementary Row Operations Gaussian Elimination Method
Review of Matrix Algebra
Numerical Analysis Lecture14.
Matrices and Matrix Operations
Numerical Analysis Lecture10.
Matrix Addition and Multiplication
To accompany the text “Introduction to Parallel Computing”,
Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as ai,j and elements of B as.
Numerical Analysis Lecture11.
Introduction to parallel algorithms
Presentation transcript:

Numerical Algorithms Chapter 11

In this chapter, we study a selection of important numerical problems: Matrix Multiplication The solution of a general system of linear equations by direct means The solution of sparse systems of linear equations and partial differential equations by iteration. Some of the techniques for these problems were introduced in earlier chapters to describe specific programming techniques. Now, we will develop the techniques in much more detail and introduce other programming techniques

1. Matrices - A Review

1.1 Matrix Addition Involves adding corresponding elements of each matrix to form the result matrix. Given the elements of A as ai,j and the elements of B as bi,j, each element of C is computed as ci,j = ai,j + bi,j (0 £ i < n, 0 £ j < m)

1.2 Matrix Multiplication Multiplication of two matrices, A and B, produces the matrix C whose elements, ci,j (0£i< n, 0£j<m), are computed as follows: where A is an n x l matrix and B is an l x m matrix.

1.2 Matrix Multiplication

1.3 Matrix-Vector Multiplication Matrix-vector multiplication follows directly from the definition of matrix-matrix multiplication by making B an n x 1 matrix (vector). Result an n x 1 matrix (vector).

1.4 Relationship of Matrices to Linear Equations A system of linear equations can be written in matrix form Ax = b Matrix A holds the a constants x is a vector of the unknowns b is a vector of the b constants.

2. Implementing Matrix Multiplication Sequential Code Assume throughout that the matrices are square (n x n matrices). The sequential code to compute A x B could simply be for (i = 0; i < n; i++) for (j = 0; j < n; j++) { c[i][j] = 0; for (k = 0; k < n; k++) c[i][j] = c[i][j] + a[i][k] * b[k][j]; } This algorithm requires n3 multiplications and n3 additions, leading to a sequential time complexity of O(n3).

2.1 Algorithm Parallel Code With n processors (and n x n matrices): Can obtain: Time complexity of O(n2) with n processors Time complexity of O(n) with n2 processors, where one element of A and B assigned to each processor. Cost optimal since O(n3) = n ´ O(n2) = n2 ´ O(n)]. Time complexity of O(log n) with n3 processors by parallelizing the inner loop Not cost optimal [since O(n3) ¹ n3 ´ O(log n)]. O(log n) lower bound for parallel matrix multiplication.

2.1 Algorithm Partitioning into Submatrices Suppose the matrix is divided into s2 submatrices. Each submatrix has n/s ´ n/s elements. Using the notation Ap,q as the submatrix in submatrix row p and submatrix column q: for (p = 0; p < s; p++) for (q = 0; q < s; q++) { Cp,q = 0; /* clear elements of submatrix */ for (r = 0; r < m; r++) /* submatrix multiplication and */ Cp,q = Cp,q + Ap,r * Br,q;/* add to accumulating submatrix */ } The line Cp,q = Cp,q + Ap,r * Br,q; means multiply submatrix Ap,r and Br,q using matrix multiplication and add to submatrix Cp,q using matrix addition.

2.1 Algorithm Known as block matrix multiplication.

2.1 Algorithm

2.2 Direct Implementation One processor to compute each element of C n2 processors would be needed. One row of elements of A and one column of elements of B needed. Some of same elements sent to more than one processor. Can use submatrices.

2.2 Direct Implementation Analysis Assuming n ´ n matrices and not submatrices. Communication With separate messages to each of the n2 slave processors: tcomm = n2(tstartup + 2ntdata) + n2(tstartup + tdata) = n2(2tstartup + (2n + 1)tdata) A broadcast along a single bus would yield tcomm = (tstartup + n2tdata) + n2(tstartup + tdata) Dominant time now is in returning the results. Computation Each slave performs in parallel n multiplications and n additions; i.e., tcomp = 2n

2.2 Direct Implementation Performance Improvement Using tree construction n numbers can be added in log n steps using n processors: Computational time complexity of O(log n) using n3 processors.

2.3 Recursive Implementation Apply same algorithm on each submatrix recursively.

Recursive Algorithm mat_mult(App, Bpp, s) { if (s == 1) /* if submatrix has one element */ C = A * B; /* multiply elements */ else { /* else continue to make recursive calls */ s = s/2; /* the number of elements in each row/column*/ P0 = mat_mult(App, Bpp, s); P1 = mat_mult(Apq, Bqp, s); P2 = mat_mult(App, Bpq, s); P3 = mat_mult(Apq, Bqq, s); P4 = mat_mult(Aqp, Bpp, s); P5 = mat_mult(Aqq, Bqp, s); P6 = mat_mult(Aqp, Bpq, s); P7 = mat_mult(Aqq, Bqq, s); Cpp = P0 + P1; /* add submatrix products */ Cpq = P2 + P3; /* to form submatrices of final matrix */ Cqp = P4 + P5; Cqq = P6 + P7; } return (C); /* return final matrix */

2.4 Mesh Implementation : Cannon’s Algorithm Movement of the elements is as follows:

2.4 Mesh Implementation : Cannon’s Algorithm Step 1 – Get data to nodes Step 2 - Alignment

2.4 Mesh Implementation : Cannon’s Algorithm Step 3 Compute, Step 4 Shift

2.4 Mesh Implementation : Cannon’s Algorithm Uses a mesh of processors with wraparound connections (a torus) to shift the A elements (or submatrices) left and the B elements (or submatrices) up. 1. Initially processor Pi,j has elements ai,j and bi,j (0 <=i < n, 0 <= j < n). 2. Elements are moved from their initial position to an “aligned” position. The complete ith row of A is shifted i places left and the complete jth column of B is shifted j places upward. This has the effect of placing the element ai,j+i and the element bi+j,j in processor Pi,j,. These elements are a pair of those required in the accumulation of ci,j. 3. Each processor, Pi,j, multiplies its elements. 4. The ith row of A is shifted one place left, and the jth column of B is shifted one place upward. This has the effect of bringing together the adjacent elements of A and B, which will also be required in the accumulation. 5. Each processor, Pi,j, multiplies the elements brought to it and adds the result to the accumulating sum. 6. Step 4 and 5 are repeated until the final result is obtained (n - 1 shifts with n rows and n columns of elements).

5. Summary This chapter discussed in detail numerical topics introduced in earlier chapters: Different parallel implementations of matrix multiplication (direct, recursive, mesh) Solving a system of linear equations using Gaussian elimination and its parallel implementation. Solving partial differential equations using Jacobi iteration. Relationships with systems of linear equations. Faster convergence methods (Gauss-Sidel relaxation, red-black ordering, higher-order differenc methods, overrelaxation, multigrid method)