Improvement to Hessenberg Reduction

Slides:

Advertisements

Similar presentations

Matrix Representation

Advertisements

3D Geometry for Computer Graphics

Lecture 19: Parallel Algorithms

Block LU Factorization Lecture 24 MA471 Fall 2003.

Parallel Matrix Operations using MPI CPS 5401 Fall 2014 Shirley Moore, Instructor November 3,

Algebraic, transcendental (i.e., involving trigonometric and exponential functions), ordinary differential equations, or partial differential equations...

Linear Algebra.

Numerical Algorithms ITCS 4/5145 Parallel Computing UNC-Charlotte, B. Wilkinson, 2009.

1 Parallel Algorithms II Topics: matrix and graph algorithms.

Lecture 13 - Eigen-analysis CVEN 302 July 1, 2002.

Autar Kaw Benjamin Rigsby Transforming Numerical Methods Education for STEM Undergraduates.

Systems of Linear Equations

Numerical Algorithms Matrix multiplication

Nequalities Takagi Factorization on a GPU using CUDA Gagandeep S. Sachdev, Vishay Vanjani & Mary W. Hall School of Computing, University of Utah What is.

Numerical Algorithms • Matrix multiplication

Maths for Computer Graphics

Linear Algebra and SVD (Some slides adapted from Octavia Camps)

Multivariable Control Systems Ali Karimpour Assistant Professor Ferdowsi University of Mashhad.

Chapter 3 Determinants and Matrices

GG313 Lecture 12 Matrix Operations Sept 29, 2005.

1 Neural Nets Applications Vectors and Matrices. 2/27 Outline 1. Definition of Vectors 2. Operations on Vectors 3. Linear Dependence of Vectors 4. Definition.

Linear Least Squares QR Factorization. Systems of linear equations Problem to solve: M x = b Given M x = b : Is there a solution? Is the solution unique?

MOHAMMAD IMRAN DEPARTMENT OF APPLIED SCIENCES JAHANGIRABAD EDUCATIONAL GROUP OF INSTITUTES.

1 Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as a i,j and elements of.

10.1 Gaussian Elimination Method

MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 6. Eigenvalue problems.

Lecture 7: Matrix-Vector Product; Matrix of a Linear Transformation; Matrix-Matrix Product Sections 2.1, 2.2.1,

Orthogonal Matrices and Spectral Representation In Section 4.3 we saw that n  n matrix A was similar to a diagonal matrix if and only if it had n linearly.

Scientific Computing QR Factorization Part 1 – Orthogonal Matrices, Reflections and Rotations.

HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B By: Zaid Abassi Supervisor: Rolf.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Chapter 10 Review: Matrix Algebra

Graphics CSE 581 – Interactive Computer Graphics Mathematics for Computer Graphics CSE 581 – Roger Crawfis (slides developed from Korea University slides)

1 February 24 Matrices 3.2 Matrices; Row reduction Standard form of a set of linear equations: Chapter 3 Linear Algebra Matrix of coefficients: Augmented.

Eigenvalue Problems Solving linear systems Ax = b is one part of numerical linear algebra, and involves manipulating the rows of a matrix. The second main.

8.1 Vector spaces A set of vector is said to form a linear vector space V Chapter 8 Matrices and vector spaces.

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 11:00-12:00.

Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.

Constraint Directed CAD Tool For Automatic Latency-optimal Implementation of FPGA-based Systolic Arrays Greg Nash Reconfigurable Technology: FPGAs and.

CMPS 1371 Introduction to Computing for Engineers MATRICES.

PDCS 2007 November 20, 2007 Accelerating the Complex Hessenberg QR Algorithm with the CSX600 Floating-Point Coprocessor Yusaku Yamamoto 1 Takafumi Miyata.

Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Elementary Linear Algebra Anton & Rorres, 9th Edition

Chapter 5 MATRIX ALGEBRA: DETEMINANT, REVERSE, EIGENVALUES.

Scientific Computing Singular Value Decomposition SVD.

Solving Linear Systems Solving linear systems Ax = b is one part of numerical linear algebra, and involves manipulating the rows of a matrix. Solving linear.

ES 240: Scientific and Engineering Computation. Chapter 8 Chapter 8: Linear Algebraic Equations and Matrices Uchechukwu Ofoegbu Temple University.

Introduction to Linear Algebra Mark Goldman Emily Mackevicius.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

Review of Linear Algebra Optimization 1/16/08 Recitation Joseph Bradley.

Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x = (2, 1) denotes a vector length = sqrt(2*2+1*1) orientation.

08/10/ NRL Hybrid QR Factorization Algorithm for High Performance Computing Architectures Peter Vouras Naval Research Laboratory Radar Division Professor.

Lecture 14: Pole placement (Regulator Problem) 1.

Section 2.1 Determinants by Cofactor Expansion. THE DETERMINANT Recall from algebra, that the function f (x) = x 2 is a function from the real numbers.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.

Unsupervised Learning II Feature Extraction

1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.

Graphics Graphics Korea University kucg.korea.ac.kr Mathematics for Computer Graphics 고려대학교 컴퓨터 그래픽스 연구실.

Computational Physics (Lecture 7) PHY4061. Eigen Value Problems.

ALGEBRAIC EIGEN VALUE PROBLEMS

Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.

Review of Matrix Operations

Matrices and vector spaces

Numerical Algorithms • Parallelizing matrix multiplication

Numerical Analysis Lecture 16.

Game Programming Algorithms and Techniques

Parallelized Analytic Placer

Presentation transcript:

Improvement to Hessenberg Reduction Shankar, Yang, Hao

What is Hessenberg Matrix A special square matrix has zero entries below the first subdiagonal or above the first superdiagonal.

Less computational efforts required for triangular matrix Why Hessenberg Matrix Less computational efforts required for triangular matrix Not convenient to convert to triangular directly then Hessenberg is a good transient form

Real life application Earthquake modeling Weather forecasting Real world matrices are huge. Dense solves are inefficient

Ways to convert to Hessenberg matrix Householder Transform Givens Rotation Variants of the above (block forms etc)

Method 1: Householder Reflection Setp1: Form Householder Matrix P1 P2 Setp2: Householder Reflection A=P1AP1-1 Note: P1 is orthogonal matrix, P1=P1-1 A: 4 by 4 random matrix A A= P2P1AP1P2 Original Matrix Final Matrix Target :Zero these entries P1 P2

How to form Householder Matrix Code Initial Operation Loop it

Example Householder Transform A 8x8 Random Square Matrix = Loop1: A1=P1AP1 Loop2: A2=P2A1P2 Loop3: A3=P3A2P3 Show you code, numbers Loop4: A4=P4A3P4 Loop5: A5=P5A4P5 Loop6: A6=P6A5P6

Method 2: Givens Rotation General Idea: Coordinate Transform, xy to x’y’, zeros one of axis component Suppose vector v=(a,b) in xy coordinate and rotation matrix G between xy and x’y’ Two Degree Coordinate Where Expand to N Degree Coordinate

Code by Matlab Example Original Matrix Resulted Matrix Zero this entry

Implemetion1: Hessenburg Reduction by using Householder Matrix General Householder Reflection A P1AP1 Full Again Solution: Hessenburg Reduction Form the Householder matrix from A (n-1) by (n-1) change the householder matrix as: First Transform Matrix First Transform A1=A (n-1,n-1) Iteration A2=A (n-2,n-2) … An-2

Hessenburg Reduction by Matlab

A 8x8 Random Square Matrix = Hessenburg Reduction A 8x8 Random Square Matrix = Loop1 Loop2 Loop3 Show the code Loop4 Loop5 Loop6

Givens Rotation function [g]=givens(x,j,i) % Function of Givens Rotation % x: Input matrix % i: Row affected by the zeroing operation % j: Row to be zeroed (column 1) % G: Givens rotation matrix g=eye(length(x)); %Initialize givens matrix xi=x(i,1); %Identify the ordinate pair over which the rotation happens xj=x(j,1); r=sqrt(xi^2+xj^2); %Find length of vector r from origin to this point %Populate rotation matrix with the necessary elements cost=xi/r; sint=xj/r; g(i,i)=cost; g(i,j)=-sint; g(j,i)=sint; g(j,j)=cost; end

Hessenberg Reduction through Givens Rotation function [R]=hessen1(A) % Hessenberg Reduction by using Givens Method count=1; n=size(A); G=eye(size(A)); %Gives rotation matrix accumulator R=A; %Copy A into R for j=1:n-2 %Outer loop (determines columns being zeroed out) for i=n:-1:j+2 %Inner loop (successively zeroes jth column) giv=givens(R(j:n, j:n), i-j+1, i-j); giv=blkdiag(eye(j-1), giv); %Resize rotator to full size G=giv*G; %Accumulate G which give a Q in the end %Perform similarity transform R=giv'*R; R=R*giv; count=count+1; end

Result

Performance Improvement Criteria CPU Cache Locality Memory Bandwidth Memory bound Computational Efficiency (Number of calcs) Size of data GPU Large latency overhead Parallelization Algorithms GPU Targetting

Blocked Operations (SIMD) Operate on large chunks of data Provides cache locality No pipeline bubbles Streaming extensions (SSEx) on modern processors target these operations Order of efficiency (ascending) Vector-Vector (Level 1) Vector-Matrix (Level 2) Matrix-Matrix (Level 3)

Parallel Operations Modern CPUs can operate on two/more sets of data simultaneously and independently Still share memory, so cache locality still important Important: Algorithm needs to be rewritten to find independent operations without dependencies on previous values Synchronization very important GPU perfect for this, massively parallel processing pipeline (‘stream processors’) ASIC(Application Specific Ics) and FPGAs (Field Programmable Gate Arrays) are perfect for this

Block Hessenberg

Results

Results norm1 = norm2 = 6.013108476912430e-13 norm3 = norm2 = 6.013108476912430e-13 norm3 = 66.823468331017068 eig_norm = eig_norm1 = 4.965725351883417e-14 eig_norm2 = 5.924617829737880e-14

Conclusion Hessenberg Reduction reduces complexity of dense eigen value solvers Column householder and Givens rotation Study of computing architecture tells us how to improve performance Blocking (SIMD) and parallelism Blocking algorithm fastest