Parallel Matrix Multiplication and other Full Matrix Algorithms

Slides:

Advertisements

Similar presentations

Basic Communication Operations

Advertisements

Partial Differential Equations

CS 484. Dense Matrix Algorithms There are two types of Matrices Dense (Full) Sparse We will consider matrices that are Dense Square.

Numerical Algorithms ITCS 4/5145 Parallel Computing UNC-Charlotte, B. Wilkinson, 2009.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Numerical Algorithms Matrix multiplication

Numerical Algorithms • Matrix multiplication

1 Friday, October 20, 2006 “Work expands to fill the time available for its completion.” -Parkinson’s 1st Law.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

CSE621/JKim Lec4.1 9/20/99 CSE621 Parallel Algorithms Lecture 4 Matrix Operation September 20, 1999.

CS 584. Dense Matrix Algorithms There are two types of Matrices Dense (Full) Sparse We will consider matrices that are Dense Square.

1/26 Design of parallel algorithms Linear equations Jari Porras.

Topic Overview One-to-All Broadcast and All-to-One Reduction

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

CS240A: Conjugate Gradients and the Model Problem.

Design of parallel algorithms Matrix operations J. Porras.

Dense Matrix Algorithms CS 524 – High-Performance Computing.

Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.

Matrices Write and Augmented Matrix of a system of Linear Equations Write the system from the augmented matrix Solve Systems of Linear Equations using.

Assignment Solving System of Linear Equations Using MPI Phạm Trần Vũ.

Conjugate gradients, sparse matrix-vector multiplication, graphs, and meshes Thanks to Aydin Buluc, Umit Catalyurek, Alan Edelman, and Kathy Yelick for.

Parallel Numerical Integration Spring Semester 2005 Geoffrey Fox Community Grids Laboratory Indiana University 505 N Morton Suite 224 Bloomington IN

Basic Communication Operations Based on Chapter 4 of Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar These.

1 Iterative Solution Methods Starts with an initial approximation for the solution vector (x 0 ) At each iteration updates the x vector by using the sytem.

Review of Matrices Or A Fast Introduction.

Row 1 Row 2 Row 3 Row m Column 1Column 2Column 3 Column 4.

Topic Overview One-to-All Broadcast and All-to-One Reduction All-to-All Broadcast and Reduction All-Reduce and Prefix-Sum Operations Scatter and Gather.

Yasser F. O. Mohammad Assiut University Egypt. Previously in NM Introduction to NM Solving single equation System of Linear Equations Vectors and Matrices.

By: David McQuilling and Jesus Caban Numerical Linear Algebra.

CS 219: Sparse Matrix Algorithms

High Performance Computing How to use Recommended Books Spring Semester 2005 Geoffrey Fox Community Grids Laboratory Indiana University 505 N Morton Suite.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Linear algebra: matrix Eigen-value Problems Eng. Hassan S. Migdadi Part 1.

JAVA AND MATRIX COMPUTATION

Parallel Solution of the Poisson Problem Using MPI

CS240A: Conjugate Gradients and the Model Problem.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

Basic Communication Operations Carl Tropper Department of Computer Science.

PARALLEL COMPUTATION FOR MATRIX MULTIPLICATION Presented By:Dima Ayash Kelwin Payares Tala Najem.

Do Now: Perform the indicated operation. 1.). Algebra II Elements 11.1: Matrix Operations HW: HW: p.590 (16-36 even, 37, 44, 46)

Numerical Algorithms Chapter 11.

MTH108 Business Math I Lecture 20.

Iterative Solution Methods

Introduction to Vectors and Matrices

Parallel Programming for Wave Equation

12-1 Organizing Data Using Matrices

Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.

Warm-up Problem Use the Laplace transform to solve the IVP.

Model Problem: Solving Poisson’s equation for temperature

Introduction to parallel algorithms

Math Linear Algebra Introduction

Matrix Operations SpringSemester 2017.

Array Processor.

LINEAR MODELS AND MATRIX ALGEBRA

Metode Eliminasi Pertemuan – 4, 5, 6 Mata Kuliah : Analisis Numerik

Parallel Matrix Operations

Decomposition Data Decomposition Functional Decomposition

Numerical Algorithms • Parallelizing matrix multiplication

CSCE569 Parallel Computing

Parallel Matrix Multiplication and other Full Matrix Algorithms

CSCI N207 Data Analysis Using Spreadsheet

Parallel Programming in C with MPI and OpenMP

Introduction to parallel algorithms

Introduction to Vectors and Matrices

To accompany the text “Introduction to Parallel Computing”,

Matrix Operations SpringSemester 2017.

Introduction to parallel algorithms

Ax = b Methods for Solution of the System of Equations:

Ax = b Methods for Solution of the System of Equations (ReCap):

Presentation transcript:

Parallel Matrix Multiplication and other Full Matrix Algorithms Spring Semester 2005 Geoffrey Fox Community Grids Laboratory Indiana University 505 N Morton Suite 224 Bloomington IN gcf@indiana.edu 11/9/2018

Abstract of Parallel Matrix Module This module covers basic full matrix parallel algorithms with a discussion of matrix multiplication, LU decomposition with latter covered for banded as well as true full case Matrix multiplication covers the approach given in “Parallel Programming with MPI" by Pacheco (Section 7.1 and 7.2) as well as Cannon's algorithm. We review those applications -- especially Computational electromagnetics and Chemistry -- where full matrices are commonly used Note sparse matrices are used much more than full matrices! 11/9/2018

Matrices and Vectors We have vectors with components xi i=1…n x = [x1,x2, … x(n-1), xn] Matrices Aij have n2 elements A = a11 a12 …a1n a21 a22 …a2n …………... an1 an2 …ann We can form y = Ax and y is a vector with components like y1 = a11x1 + a12 x2 + .. + a1nxn ......... yn = an1x1 + a12 x2 + .. + annxn 11/9/2018

More on Matrices and Vectors Much effort is spent on solving equations like Ax=b for x x =A-1b We will discuss matrix multiplication C=AB where C A and B are matrices Other major activities involve finding eigenvalues λ and eigenvectors x of matrix A Ax = λx Many if not the majority of scientific problems can be written in matrix notation but the structure of A is very different in each case In writing Laplace’s equation in matrix form, in two dimensions (N by N grid) one finds N2 by N2 matrices with at most 5 nonzero elements in each row and column Such matrices are sparse – nearly all elements are zero IN some scientific fields (using “quantum theory”) one writes Aij as <i|A|j> with a bra <| and ket |> notation 11/9/2018

Review of Matrices seen in PDE's Partial differential equations are written as given below for Poisson’s equation Laplace’s equation is ρ = 0 2 Φ= 2Φ/x2 + 2Φ/y2 in two dimensions 11/9/2018

Examples of Full Matrices in Chemistry 11/9/2018

Operations used with Hamiltonian operator 11/9/2018

Examples of Full Matrices in Chemistry 11/9/2018

Examples of Full Matrices in Electromagnetics 11/9/2018

Notes on the use of full matrices 11/9/2018

Introduction: Full Matrix Multiplication 11/9/2018

Sub-block definition of Matrix Multiply Note indices start at 0 for rows and columns of matrices They start at 1 for rows and columns of processors 11/9/2018

The First Algorithm (Broadcast, Multiply, and Roll) Called “Fox’s” in Pacheco but really Fox and Hey (Fox, G. C., Hey, A. and Otto, S., ``Matrix Algorithms on the Hypercube I: Matrix Multiplication,'' Parallel Computing, 4, 17 (1987), 11/9/2018

The first stage -- index n=0 in sub-block sum -- of the algorithm on N=16 example 11/9/2018

The second stage -- n=1 in sum over subblock indices -- of the algorithm on N=16 example 11/9/2018

Second stage, continued 11/9/2018

Look at the whole algorithm on one element 11/9/2018

MPI: Processor Groups and Collective Communication We need “partial broadcasts” along rows And rolls (shifts by 1) in columns Both of these are collective communication “Row Broadcasts” are broadcasts in special subgroups of processors Rolls are done as variant of MPI_SENDRECV with “wrapped” boundary conditions There are also special MPI routines to define the two dimensional mesh of processors 11/9/2018

Broadcast in the Full Matrix Case Matrix Multiplication makes extensive use of broadcast operations as its communication primitives We can use this application to discuss three approaches to broadcast Naive Logarithmic given in Laplace discussion Pipe Which have different performance depending on message sizes and hardware architecture 11/9/2018

Implementation of Naive and Log Broadcast 11/9/2018

The Pipe Broadcast Operation In the case that the size of the message is large, other implementation optimizations are possible, since it will be necessary for the broadcast message to be broken into a sequence of smaller messages. The broadcast can set up a path (or paths) from the source processor that visits every processor in the group. The message is sent from the source along the path in a pipeline, where each processor receives a block of the message from its predecessor and sends it to its successor. The performance of this broadcast is then the time to send the message to the processor on the end of the path plus the overhead of starting and finishing the pipeline. Time = (Message Size + Packet Size (√N – 2))tcomm For sufficiently large grain size the pipe broadcast is better than the log broadcast Message latency hurts Pipeline algorithm 11/9/2018

Schematic of Pipe Broadcast Operation 11/9/2018

Performance Analysis of Matrix Multiplication 11/9/2018

Cannon's Algorithm for Matrix Multiplication 11/9/2018

Cannon's Algorithm 11/9/2018

The Set-up Stage of Cannon’s Algorithm 11/9/2018

The first iteration of Cannon’s algorithm 11/9/2018

Performance Analysis of Cannon's Algorithm 11/9/2018