CS 584 Lecture 20 n Assignment –Glenda program n Project Proposal is coming up! (March 13) »2 pages text + 1 page plan of action »3 references n No class.

Slides:



Advertisements
Similar presentations
Load Balancing Parallel Applications on Heterogeneous Platforms.
Advertisements

Systolic Arrays & Their Applications
EECC756 - Shaaban #1 lec # 1 Spring Systolic Architectures Replace single processor with an array of regular processing elements Orchestrate.
T h e G a s L a w s. T H E G A S L A W S z B o y l e ‘ s L a w z D a l t o n ‘ s L a w z C h a r l e s ‘ L a w z T h e C o m b i n e d G a s L a w z B.
Dense Matrix Algorithms. Topic Overview Matrix-Vector Multiplication Matrix-Matrix Multiplication Solving a System of Linear Equations.
Parallel Matrix Operations using MPI CPS 5401 Fall 2014 Shirley Moore, Instructor November 3,
CSCI-455/552 Introduction to High Performance Computing Lecture 11.
CS 484. Dense Matrix Algorithms There are two types of Matrices Dense (Full) Sparse We will consider matrices that are Dense Square.
Numerical Algorithms Matrix multiplication
CSE5304—Project Proposal Parallel Matrix Multiplication Tian Mi.
Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers Chapter 11: Numerical Algorithms Sec 11.2: Implementing.
Section 4.2 – Multiplying Matrices Day 2
1 Markov Analysis In an industry with 3 firms we could look at the market share of each firm at any time and the shares have to add up to 100%. If we had.
Numerical Algorithms • Matrix multiplication
1 Friday, October 20, 2006 “Work expands to fill the time available for its completion.” -Parkinson’s 1st Law.
Examples of Two- Dimensional Systolic Arrays. Obvious Matrix Multiply Rows of a distributed to each PE in row. Columns of b distributed to each PE in.
CSE621/JKim Lec4.1 9/20/99 CSE621 Parallel Algorithms Lecture 4 Matrix Operation September 20, 1999.
1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.
Design of parallel algorithms
CS 584. Dense Matrix Algorithms There are two types of Matrices Dense (Full) Sparse We will consider matrices that are Dense Square.
Chapter 5, CLR Textbook Algorithms on Grids of Processors.
Design of parallel algorithms Matrix operations J. Porras.
Dense Matrix Algorithms CS 524 – High-Performance Computing.
1 Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as a i,j and elements of.
Multiplying matrices An animated example. (3 x 3)x (3 x 2)= (3 x 2) These must be the same, otherwise multiplication cannot be done Is multiplication.
You have to pay £87 per week for your rent. You have to pay the rent for 51 weeks of the year. How much will you pay over the year?
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
1 High-Performance Grid Computing and Research Networking Presented by Xing Hang Instructor: S. Masoud Sadjadi
Lesson 13-1: Matrices & Systems Objective: Students will: State the dimensions of a matrix Solve systems using matrices.
CS 584 l Assignment. Systems of Linear Equations l A linear equation in n variables has the form l A set of linear equations is called a system. l A solution.
CSE Introduction to Computing Concepts. Outline  What is an application program?  What is Excel?  Creating a Simple Workbook  Writing Formulas.
By: David McQuilling and Jesus Caban Numerical Linear Algebra.
P ARALLELIZATION IN M OLECULAR D YNAMICS By Aditya Mittal For CME346A by Professor Eric Darve Stanford University.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-1 Two allocations of a 16X16 array to 16 processes: (a) 2-dimensional blocks;
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Graphics Matrices. Today’s Lecture Brought to you by the integer 6 and letter ‘K’; 2D and 3D points Matrices Rotations Translation Putting it all together.
1 CS240: Network Routing Michalis Faloutsos. 2 Scope Routing Basics BGP routing Ad hoc routing Security Issues Group communications: Broadcast, Multicast.
Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Reduced slides for CSCE 3030 To accompany the text ``Introduction.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.
Step 1: Place x 2 term and constant into the box 2x 2 2 PROBLEM: 2x 2 + 5x + 2.
CS 471 Final Project 2d Advection/Wave Equation Using Fourier Methods December 10, 2003 Jose L. Rodriguez
8.2 Operations With Matrices
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 8 October 23, 2002 Nayda G. Santiago.
08/10/ NRL Hybrid QR Factorization Algorithm for High Performance Computing Architectures Peter Vouras Naval Research Laboratory Radar Division Professor.
Data Structures and Algorithms in Parallel Computing Lecture 10.
Event Ordering. CS 5204 – Operating Systems2 Time and Ordering The two critical differences between centralized and distributed systems are: absence of.
PARALLEL PROCESSING From Applications to Systems Gorana Bosic Veljko Milutinovic
CS 420 Design of Algorithms Parallel Algorithm Design.
Lecture 9 Architecture Independent (MPI) Algorithm Design
Basic Communication Operations Carl Tropper Department of Computer Science.
PARALLEL COMPUTATION FOR MATRIX MULTIPLICATION Presented By:Dima Ayash Kelwin Payares Tala Najem.
Notes Over 4.2 Finding the Product of Two Matrices Find the product. If it is not defined, state the reason. To multiply matrices, the number of columns.
Do Now: Perform the indicated operation. 1.). Algebra II Elements 11.1: Matrix Operations HW: HW: p.590 (16-36 even, 37, 44, 46)
4.3 Multiplying Matrices Dimensions matching Rows times Columns.
CS1101: Programming Methodology Aaron Tan.
Parallel Matrix Multiplication and other Full Matrix Algorithms
Multiplying Matrices.
Label the four geographic regions:
Parallel Matrix Operations
Parallel Matrix Multiplication and other Full Matrix Algorithms
Parallel Programming in C with MPI and OpenMP
Numerical Algorithms Quiz questions

Multiplying Matrices.
Dense Linear Algebra (Data Distributions)
3.6 Multiply Matrices.
Research Paper Overview.
Independent Task – Level 7
Presentation transcript:

CS 584 Lecture 20 n Assignment –Glenda program n Project Proposal is coming up! (March 13) »2 pages text + 1 page plan of action »3 references n No class March 13 –Put your project proposal in my box. –Paper presentations on March 11 (Tom Abbott)

Module Compostion

Case Study: Matrix Multiply n Goal: Data-distribution neutral n Three basic ways to distribute –row –column –submatrix n Question? –Does our library need different algorithms?

Analytical Model n Compare the two algorithms n Ignore the computation costs n What are the communication costs.

One Dimensional Decomposition n Each processor "owns" black portion n To compute the owned portion of the answer, each processor requires all of A. n This affects data-distribution.

1-D Decomp.          P N ttPT ws 2 )1(

Two Dimensional Decomposition n Requires less data per processor n Algorithm can be performed stepwise.

Broadcast an A sub- matrix to the other processors in row. Compute Rotate the B sub- matrix upwards

Algorithm Set B' = B local for j = 0 to sqrt(P) -2 in each row I the [(I+j) mod sqrt(P)]th task broadcasts A' = A local to the other tasks in the row accumulate A' * B' send B' to upward neighbor done

2-D Decomp.                  P N tt P PT ws log 1

Redistribution n If we only have one algorithm, we need to possibly redistribute the data n How much does this cost?

Redistribution          PP N ttPT ws 2 1

Analysis n Performance analysis reveals that the 2 dimensional decomposition is always better. n So our matrix multiply only needs one algorithm –Might need redistribution algorithm to be totally data distribution neutral n However, this is not the best algorithm.

Systolic Algorithm           P N ttPT ws 2 12