Notes on Homework 1 What is Peak ? CS267 Lecture 2 CS267 Lecture 2 1.

Slides:



Advertisements
Similar presentations
Section 13-4: Matrix Multiplication
Advertisements

Section 4.2 – Multiplying Matrices Day 2
Maths for Computer Graphics
02/09/2010CS267 Lecture 71 Notes on Homework 1 Must write SIMD code to get past 50% of peak!
02/09/2010CS267 Lecture 71 Notes on Homework 1 Must write SIMD code to get past 50% of peak!
SSE for H.264 Encoder Chuck Tsen Sean Pieper. SSE– what can’t it do? Mixed scalar and vector Unaligned memory accesses Predicated execution >2 source.
Table of Contents Matrices - Multiplication Assume that matrix A is of order m  n and matrix B is of order p  q. To determine whether or not A can be.
100’s of free ppt’s from library
4.2 Operations with Matrices Scalar multiplication.
CS32310 MATRICES 1. Vector vs Matrix transformation formulae Geometric reasoning allowed us to derive vector expressions for the various transformation.
Ch X 2 Matrices, Determinants, and Inverses.
Performance Optimization Getting your programs to run faster CS 691.
In Lesson 4-2, you multiplied matrices by a number called a scalar
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
4.3 Matrix Multiplication 1.Multiplying a Matrix by a Scalar 2.Multiplying Matrices.
Performance Optimization Getting your programs to run faster.
Notes on Homework 1. 2x2 Matrix Multiply C 00 += A 00 B 00 + A 01 B 10 C 10 += A 10 B 00 + A 11 B 10 C 01 += A 00 B 01 + A 01 B 11 C 11 += A 10 B 01 +
Introdution to SSE or How to put your algorithms on steroids! Christian Kerl
Matrix Algebra Section 7.2. Review of order of matrices 2 rows, 3 columns Order is determined by: (# of rows) x (# of columns)
8.2 Operations With Matrices
Multiplying Matrices Algebra 2—Section 3.6. Recall: Scalar Multiplication - each element in a matrix is multiplied by a constant. Multiplying one matrix.
1.3 Solutions of Linear Systems
MATRICES Operations with Matrices Properties of Matrix Operations
SSE and SSE2 Jeremy Johnson Timothy A. Chagnon All images from Intel® 64 and IA-32 Architectures Software Developer's Manuals.
3.6 Multiplying Matrices Homework 3-17odd and odd.
CS61C L20 Thread Level Parallelism I (1) Garcia, Spring 2013 © UCB Senior Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
Notes Over 4.2 Finding the Product of Two Matrices Find the product. If it is not defined, state the reason. To multiply matrices, the number of columns.
Do Now: Perform the indicated operation. 1.). Algebra II Elements 11.1: Matrix Operations HW: HW: p.590 (16-36 even, 37, 44, 46)
4-3 Matrix Multiplication Objective: To multiply a matrix by a scalar multiple.
SIMD Programming CS 240A, Winter Flynn* Taxonomy, 1966 In 2013, SIMD and MIMD most common parallelism in architectures – usually both in same.
MTH108 Business Math I Lecture 20.
Elementary Matrix Theory
13.4 Product of Two Matrices
12-1 Organizing Data Using Matrices
Matrix Operations Free powerpoints at
Linear Algebra review (optional)
Matrix Operations.
Chapter 7 Matrix Mathematics
Optimizing the code using SSE intrinsics
SIMD Multimedia Extensions
Matrix Operations.
Matrix Operations Free powerpoints at
High-Performance Matrix Multiplication
Matrix Operations.
Multiplying Matrices Algebra 2—Section 3.6.
Vladimir Stojanovic & Nicholas Weaver
Getting Started with Automatic Compiler Vectorization
Vector Processing => Multimedia
SIMD Programming CS 240A, 2017.
Instructor: Michael Greenbaum
Notes on Homework 1 DBS=328 lines BLK+UNJ=68 lines BLK=48 lines
Matrix Operations Free powerpoints at
Multiplying Matrices.
Computer Programming Machine and Assembly.
CS4961 Parallel Programming Lecture 12: Data Locality, cont
Objectives Multiply two matrices.
Multiplying Matrices.
EE 193: Parallel Computing
Multi-Core Programming Assignment
Notes on Homework 1 CS267 Lecture 2 CS267 Lecture 2 1.
3.5 Perform Basic Matrix Operations
3.6 Multiply Matrices.
Linear Algebra review (optional)
Chapter 4 Matrices & Determinants
Matrix Multiplication
Multiplying Matrices.
Multiplying Matrices.
ENERGY 211 / CME 211 Lecture 29 December 3, 2008.
Multiplying Matrices.
Presentation transcript:

Notes on Homework 1 What is Peak ? CS267 Lecture 2 CS267 Lecture 2 1

Summary of SSE intrinsics #include <emmintrin.h> Vector data type: __m128d Load and store operations: _mm_load_pd _mm_store_pd _mm_loadu_pd _mm_storeu_pd Load and broadcast across vector _mm_load1_pd Arithmetic: _mm_add_pd _mm_mul_pd CS267 Lecture 2 CS267 Lecture 2 2

2x2 Matrix Multiply C00 += A00B10 + A01B00 C10 += A10B10 + A11B00 C01 += A00B11 + A01B01 C11 += A10B11 + A11B01 Rewrite as SIMD algebra C00_C10 += A00_A10 * B10_B10 C01_C11 += A00_A10 * B11_B11 C00_C10 += A01_A11 * B00_B00 C01_C11 += A01_A11 * B01_B01 CS267 Lecture 7 02/11/2009

Example: multiplying 2x2 matrices #include <emmintrin.h> c1 = _mm_loadu_pd( C+0*lda ); //load unaligned block in C c2 = _mm_loadu_pd( C+1*lda ); for( int i = 0; i < 2; i++ ) { a = _mm_load_pd( A+i*lda );//load aligned i-th column of A b1 = _mm_load1_pd( B+i+0*lda ); //load i-th row of B b2 = _mm_load1_pd( B+i+1*lda ); c1=_mm_add_pd( c1, _mm_mul_pd( a, b1 ) ); //rank-1 update c2=_mm_add_pd( c2, _mm_mul_pd( a, b2 ) ); } _mm_storeu_pd( C+0*lda, c1 ); //store unaligned block in C _mm_storeu_pd( C+1*lda, c2 ); CS267 Lecture 2 CS267 Lecture 2 4

Other Issues Checking efficiency of the compiler helps Use -S option to see the generated assembly code Inner loop should consist mostly of ADDPD and MULPD ops, ADDSD and MULSD imply scalar computations Consider using another compiler Options are PGI, PathScale and GNU Cray C compiler doesn’t seem to have any way to use SSE It is capable of auto-vectorizing loops it recognizes Look through Goto and van de Geijn’s paper Don’t ignore the other optimizations more compiler flags loop unrolling inlining keywords (ask your classmates!) CS267 Lecture 2 CS267 Lecture 2 5