MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix.

Slides:



Advertisements
Similar presentations
Matrix.
Advertisements

2.3 Modeling Real World Data with Matrices
Section 13-4: Matrix Multiplication
Optimizing Compilers for Modern Architectures Allen and Kennedy, Chapter 13 Compiling Array Assignments.
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
Parallel Matrix Operations using MPI CPS 5401 Fall 2014 Shirley Moore, Instructor November 3,
Numerical Algorithms Matrix multiplication
Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers Chapter 11: Numerical Algorithms Sec 11.2: Implementing.
Section 4.2 – Multiplying Matrices Day 2
Parallel Processing & Distributed Systems Thoai Nam Chapter 2.
Matrix Multiplication To Multiply matrix A by matrix B: Multiply corresponding entries and then add the resulting products (1)(-1)+ (2)(3) Multiply each.
Maths for Computer Graphics
1 Friday, October 20, 2006 “Work expands to fill the time available for its completion.” -Parkinson’s 1st Law.
Ch 7.2: Review of Matrices For theoretical and computation reasons, we review results of matrix theory in this section and the next. A matrix A is an m.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
1/26 Design of parallel algorithms Linear equations Jari Porras.
Chapter 5, CLR Textbook Algorithms on Grids of Processors.
Design of parallel algorithms Matrix operations J. Porras.
Dense Matrix Algorithms CS 524 – High-Performance Computing.
1 Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as a i,j and elements of.
Multiplying matrices An animated example. (3 x 3)x (3 x 2)= (3 x 2) These must be the same, otherwise multiplication cannot be done Is multiplication.
Introduction to Parallel Processing Ch. 12, Pg
Table of Contents Matrices - Multiplication Assume that matrix A is of order m  n and matrix B is of order p  q. To determine whether or not A can be.
Row 1 Row 2 Row 3 Row m Column 1Column 2Column 3 Column 4.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
4.2 Operations with Matrices Scalar multiplication.
-1.1- PIPELINING 2 nd week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PIPELINING 2 nd week References Pipelining concepts The DLX.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi.
Matrix Arithmetic. A matrix M is an array of cell entries (m row,column ) and it must have rectangular dimensions (Rows x Columns). Example: 3x x.
PARALLEL PARADIGMS AND PROGRAMMING MODELS 6 th week.
AIM: How do we perform basic matrix operations? DO NOW:  Describe the steps for solving a system of Inequalities  How do you know which region is shaded?
-1- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Parallel Computer Architectures 2 nd week References Flynn’s Taxonomy Classification of Parallel.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
4.3 Matrix Multiplication 1.Multiplying a Matrix by a Scalar 2.Multiplying Matrices.
Parallel Processing & Distributed Systems Thoai Nam And Vu Le Hung.
Parallel Processing & Distributed Systems Thoai Nam Chapter 2.
MATRICES Danny Nguyen Marissa Lally Clauberte Louis HOW TO'S: ADD, SUBTRACT, AND MULTIPLY MATRICES.
Objectives: Students will be able to… Multiply two matrices Apply matrix multiplication to real life problems.
Example 7 Multiplying with Technology Chapter 7.3 Use technology to compute BA and AB if and.  2009 PBLPathways.
Basic Communication Operations Carl Tropper Department of Computer Science.
PARALLEL COMPUTATION FOR MATRIX MULTIPLICATION Presented By:Dima Ayash Kelwin Payares Tala Najem.
3.6 Multiplying Matrices Homework 3-17odd and odd.
MULTI-DIMENSIONAL ARRAYS 1. Multi-dimensional Arrays The types of arrays discussed so far are all linear arrays. That is, they all dealt with a single.
Notes Over 4.2 Finding the Product of Two Matrices Find the product. If it is not defined, state the reason. To multiply matrices, the number of columns.
Parallel Processing & Distributed Systems Thoai Nam Chapter 3.
Where do you sit?. What is a matrix? How do you classify matrices? How do you identify elements of a matrix?
A rectangular array of numeric or algebraic quantities subject to mathematical operations. The regular formation of elements into columns and rows.
CSc 8530 Matrix Multiplication and Transpose By Jaman Bhola.
2 2.4 © 2016 Pearson Education, Ltd. Matrix Algebra PARTITIONED MATRICES.
Numerical Algorithms Chapter 11.
13.4 Product of Two Matrices
Sections 2.4 and 2.5 Matrix Operations
Properties and Applications of Matrices
12-1 Organizing Data Using Matrices
Multiplying Matrices.
Matrices Rules & Operations.
Matrix Multiplication
Array Processor.
Multiplying Matrices.
WarmUp 2-3 on your calculator or on paper..
Multiplying Matrices.
Parallel Programming in C with MPI and OpenMP
Multiplying Matrices.
3.6 Multiply Matrices.
Multiplying Matrices.
Multiplying Matrices.
Multiplying Matrices.
Presentation transcript:

MATRIX MULTIPLICATION 4 th week

-2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix multiplication Algorithms for processor arrays – Matrix multiplication on 2-D mesh SIMD model – Matrix multiplication on hypercube SIMD model Matrix multiplication on UMA multiprocessors Matrix multiplication on multicomputers

-3- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM REFERENCES Parallel Computing: Theory and Practice – Michael J. Quinn, McGraw-Hill Chapter 7

-4- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM SEQUENTIAL MATRIX MULTIPLICATION Global a[0..l-1,0..m-1], b[0..m-1][0..n-1], {Matrices to be multiplied} c[0..l-1,0..n-1], {Product matrix} t, {Accumulates dot product} i, j, k; Begin for i:=0 to l-1 do for j:=0 to n-1 do t:=0; for k:=0to m-1 do t:=t+a[i][k]*b[k][j]; endfor k; c[i][j]:=k; endfor j; endfor i; End.

-5- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM ALGORITHMS FOR PROCESSOR ARRAYS Matrix multiplication on 2-D mesh SIMD model Matrix multiplication on Hypercube SIMD model

-6- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL Gentleman(1978) has shown that multiplication of to n*n matrices on the 2-D mesh SIMD model requires 0(n) routing steps We will consider a multiplication algorithm on a 2- D mesh SIMD model with wraparound connections

-7- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL (cont’d) For simplicity, we suppose that – Size of the mesh is n*n – Size of each matrix (A and B) is n*n – Each processor P i,j in the mesh (located at row i,column j) contains a i,j and b i,j At the end of the algorithm, P i,j will hold the element c i,j of the product matrix

-8- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL (cont’d) Major phases (a) Initial distribution of matrices A and B (b) Staggering all A’s elements in row i to the left by i positions and all B’s elements in col j upwards by i positions a 0,0 b 0,0 a 0,1 b 0,1 a 0,2 b 0,2 a 0,3 b 0,3 a 1,0 b 1,0 a 1,1 b 1,1 a 1,2 b 1,2 a 1,3 b 1,3 a 2,0 b 2,0 a 2,1 b 2,1 a 2,2 b 2,2 a 2,3 b 2,3 a 3,0 b 3,0 a 3,1 b 3,1 a 3,2 b 3,2 a 3,3 b 3,3 a 0,0 b 0,0 a 0,1 b 1,1 a 0,2 b 2,2 a 0,3 b 3,3 a 1,1 b 1,0 a 1,2 b 2,1 a 1,3 b 3,2 a 2,2 b 2,0 a 2,3 b 3,1 a 3,3 b 3,0 a 3,2 a 3,1 a 3,0 a 2,1 a 2,0 a 1,0 b 2,3 b 1,2 b 0,1 b 1,3 b 0,2 b 0,3

-9- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL (cont’d) (c) Distribution of 2 matrices A and B after staggering in a 2-D mesh with wrapparound connection a 0,0 b 0,0 a 0,1 b 1,1 a 0,2 b 2,2 a 0,3 b 3,3 a 1,1 b 1,0 a 1,2 b 2,1 a 1,3 b 3,2 a 1,0 b 0,3 a 2,2 b 2,0 a 2,3 b 3,1 a 2,0 b 0,2 a 2,1 b 1,3 a 3,3 b 3,0 a 3,0 b 0,1 a 3,1 b 1,2 a 3,2 b 2,3 (b) Staggering all A’s elements in row i to the left by i positions and all B’s elements in col j upwards by i positions a 0,0 b 0,0 a 0,1 b 1,1 a 0,2 b 2,2 a 0,3 b 3,3 a 1,1 b 1,0 a 1,2 b 2,1 a 1,3 b 3,2 a 2,2 b 2,0 a 2,3 b 3,1 a 3,3 b 3,0 a 3,2 a 3,1 a 3,0 a 2,1 a 2,0 a 1,0 b 2,3 b 1,2 b 0,1 b 1,3 b 0,2 b 0,3 Each processor P(i,j) has a pair of elements to multiply a i,k and b k,j

-10- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL (cont’d) The rest steps of the algorithm from the viewpoint of processor P(1,2) a 1,3 b 3,2 b 2,2 b 0,2 b 1,2 a 1,1 a 1,2 a 1,0 (a) First scalar multiplication step a 1,0 b 0,2 b 3,2 b 1,2 b 2,2 a 1,2 a 1,3 a 1,1 (b) Second scalar multiplication step after elements of A are cycled to the left and elements of B are cycled upwards

-11- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL (cont’d) a 1,1 b 1,2 b 0,2 b 2,2 b 3,2 a 1,3 a 1,0 a 1,2 (c) Third scalar multiplication step after second cycle step (d) Third scalar multiplication step after second cycle step. At this point processor P(1,2) has computed the dot product c 1,2 a 1,2 b 2,2 b 1,2 b 3,2 b 0,2 a 1,0 a 1,1 a 1,3

-12- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL (cont’d) Detailed Algorithm Global n, {Dimension of matrices} k ; Local a, b, c; Begin for k:=1 to n-1 do forall P(i,j) where 1 ≤ i,j < n do if i ≥ k then a:= fromleft(a); if j ≥ k then b:=fromdown(b); end forall; endfor k; forall P(i,j) where 0 ≤ i,j < n do c:= a*b; end forall; for k:=1 to n-1 do forall P(i,j) where 0 ≤ i,j < n do a:= fromleft(a); b:=fromdown(b); c:= c + a*b; end forall; endfor k; End. Stagger 2 matrices a[0..n-1,0..n-1] and b[0..n-1,0..n-1] Compute dot product

-13- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL (cont’d) Can we implement the abovementioned algorithm on a 2-D mesh SIMD model without wrapparound connection?

-14- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION ALGORITHM FOR MULTIPROCESSSORS Design strategy 5 – If load balancing is not a problem, maximize grain size Grain size: the amount of work performed between processor interactions Things to be considered – Parallelizing the most outer loop of the sequential algorithm is a good choice since the attained grain size (0(n 3 /p)) is the biggest – Resolving memory contention as much as possible

-15- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION ALGORITHM FOR UMA MULTIPROCESSSORS Algorithm using p processors Global n, {Dimension of matrices} a[0..n-1,0..n-1], b[0..n-1,0..n-1]; {Two input matrices} c[0..n-1,0..n-1]; {Product matrix} Local i,j,k,t; Begin forall P m where 1 ≤ m< ≤ p do for i:=m to n step p do for j:= 1 to n to t:=0; for k:=1 to n do t:=t+a[i,k]*b[km,j]; endfor j; c[i][j]:=t; endfor i; end forall; End.

-16- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION ALGORITHM FOR NUMA MULTIPROCESSSORS Things to be considered – Try to resolve memory contention as much as possible – Increase the locality of memory references to reduce memory access time Design strategy 6 – Reduce average memory latency time by increasing locality The block matrix multiplication algorithm is a reasonable choice in this situation – Section 7.3, p.187, Parallel Computing: Theory and Practice

-17- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION ALGORITHM FOR MULTICOMPUTERS We will study 2 algorithms on multicomputers – Row-Column-Oriented Algorithm (in class) – Block-Oriented Algorithm (to be read at home)

-18- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM ROW-COLUMN-ORIENTED ALGORITHM The processes are organized as a ring – Step 1: Initially, each process is given 1 row of the matrix A and 1 column of the matrix B – Step 2: Each process uses vector multiplication to get 1 element of the product matrix C. – Step 3: After a process has used its column of matrix B, it fetches the next column of B from its successor in the ring – Step 4: If all rows of B have already been processed, quit. Otherwise, go to step 2

-19- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM ROW-COLUMN-ORIENTED ALGORITHM (cont’d) Why do we have to organize processes as a ring and make them use B’s rows in turn? Design strategy 7: – Eliminate contention for shared resourcesby changing the order of data access

-20- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM ROW-COLUMN-ORIENTED ALGORITHM (cont’d) A BC A B C A B C A B C Example: Use 4 processes to mutliply two matrices A 4*4 and B 4*4 1 st iteration

-21- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM ROW-COLUMN-ORIENTED ALGORITHM (cont’d) A BC A B C A B C A B C Example: Use 4 processes to mutliply two matrices A 4*4 and B 4*4 2 nd iteration

-22- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM ROW-COLUMN-ORIENTED ALGORITHM (cont’d) A BC A B C A B C A B C Example: Use 4 processes to mutliply two matrices A 4*4 and B 4*4 3 rd iteration

-23- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM ROW-COLUMN-ORIENTED ALGORITHM (cont’d) A BC A B C A B C A B C Example: Use 4 processes to mutliply two matrices A 4*4 and B 4*4 4 th iteration (the last)