CSE5304—Project Proposal Parallel Matrix Multiplication Tian Mi.

Slides:



Advertisements
Similar presentations
CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Advertisements

Instructor Neelima Gupta Table of Contents Parallel Algorithms.
N-Body Simulation Michael Mersic CS680.
Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.
CSE5304—Project Proposal Parallel Matrix Multiplication Tian Mi.
OpenFOAM on a GPU-based Heterogeneous Cluster
Getting Started with MPI Self Test with solution.
Performance Analysis of MPI Communications on the SGI Altix 3700 Nor Asilah Wati Abdul Hamid, Paul Coddington, Francis Vaughan Distributed & High Performance.
Examples of Two- Dimensional Systolic Arrays. Obvious Matrix Multiply Rows of a distributed to each PE in row. Columns of b distributed to each PE in.
12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
4/26/05Han: ELEC72501 Department of Electrical and Computer Engineering Auburn University, AL K.Han Development of Parallel Distributed Computing System.
Point-to-Point Communication Self Test with solution.
Collective Communications Self Test with solution.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Nor Asilah Wati Abdul Hamid, Paul Coddington, Francis Vaughan School of Computer Science, University of Adelaide IPDPS - PMEO April 2006 Comparison of.
MPI Workshop - II Research Staff Week 2 of 3.
Parallel Implementation of the Inversion of Polynomial Matrices Alina Solovyova-Vincent March 26, 2003 A thesis submitted in partial fulfillment of the.
Nor Asilah Wati Abdul Hamid, Paul Coddington. School of Computer Science, University of Adelaide PDCN FEBRUARY 2007 AVERAGES, DISTRIBUTIONS AND SCALABILITY.
18.337: Image Median Filter Rafael Palacios Aeronautics and Astronautics department. Visiting professor (IIT-Institute for Research in Technology, University.
PARUS: a parallel programming framework for heterogeneous multiprocessor systems Alexey N. Salnikov (salnikov cs.msu.su) Moscow State University Faculty.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
Collective Communication
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.
Communications. How do computers work?  Computer is made up of many different parts  Receives input from user  Processes information  Produces an.
The University of Adelaide, School of Computer Science
Computer Science 320 Broadcasting. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri +
Chapter 6 Parallel Sorting Algorithm Sorting Parallel Sorting Bubble Sort Odd-Even (Transposition) Sort Parallel Odd-Even Transposition Sort Related Functions.
Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard.
Introduction to Parallel Programming with C and MPI at MCSR Part 2 Broadcast/Reduce.
Pursuing Faster I/O in COSMO POMPA Workshop May 3rd 2010.
HPCA2001HPCA Message Passing Interface (MPI) and Parallel Algorithm Design.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Parallelization of the Classic Gram-Schmidt QR-Factorization
Parallelization of Classification Algorithms For Medical Imaging on a Cluster Computing System 指導教授 : 梁廷宇 老師 系所 : 碩光通一甲 姓名 : 吳秉謙 學號 :
CS 591 x I/O in MPI. MPI exists as many different implementations MPI implementations are based on MPI standards MPI standards are developed and maintained.
Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Parallel Programming with MPI and OpenMP
Math – What is a Function? 1. 2 input output function.
Parallel Programming & Cluster Computing MPI Collective Communications Dan Ernst Andrew Fitz Gibbon Tom Murphy Henry Neeman Charlie Peck Stephen Providence.
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
PARALLELIZATION OF MULTIPLE BACKSOLVES James Stanley April 25, 2002 Project #2.
PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.
Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 3 Distributed Memory Programming with MPI An Introduction to Parallel Programming Peter Pacheco.
CS4402 – Parallel Computing
MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.
Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002.
2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()
Lecture 9 Architecture Independent (MPI) Algorithm Design
Joe Bradish Parallel Neural Networks. Background  Deep Neural Networks (DNNs) have become one of the leading technologies in artificial intelligence.
Performance Evaluation of Parallel Algorithms on a Computational Grid Environment Simona Blandino 1, Salvatore Cavalieri 2 1 Consorzio COMETA, 2 Faculty.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
Lecture 3: Today’s topics MPI Broadcast (Quinn Chapter 5) –Sieve of Eratosthenes MPI Send and Receive calls (Quinn Chapter 6) –Floyd’s algorithm Other.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Sieve of Eratosthenes.
MPI Message Passing Interface
Introduction to parallel algorithms
Parallel Programming in C with MPI and OpenMP
Parallel Inversion of Polynomial Matrices
CSCE569 Parallel Computing
Paraguin Compiler Communication.
Introduction to parallel algorithms
Numerical Algorithms Quiz questions
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Parallel build blocks.
More Quiz Questions Parallel Programming MPI Collective routines
Introduction to parallel algorithms
Presentation transcript:

CSE5304—Project Proposal Parallel Matrix Multiplication Tian Mi

An naive version with MPI P1 P1  P2 P2  … Pi Pi  … PN PN  Result:

An naive version with MPI Pi Pi   Pi  Pi

Processor0 reads input file Processor0 distributes one matrix Processor0 broadcasts the other matrix All processors in parallel  Do the multiplication of each piece of data Processor0 gathers the result Processor0 writes result to output file

MPI_Scatter

MPI_Bcast

MPI_Gather

Data generation Data generation in R with package “igraph” Integer in range of [-1000, 1000] Matrix size: Matrix512* * * *4096 File size2.69 MB10.7 MB43.1 MB172 MB

Result Data size: 1024*1024 # ProcessorsExperiments(second)Average(s)Speedup

Result Data size: 1024*1024

Result Data size: 1024*1024

Result Data size: 2048*2048 # ProcessorsTime(s)Speedup

Result Data size: 2048*2048

Result Data size: 2048*2048

Result Data size: 4096*4096 # ProcessorsTime(s)Speedup #DIV/0! 128#DIV/0!

Analysis To see the superlinear speedup  increase the computation, which is not dominant enough larger matrix and larger integer  However, larger matrix or long integer will also increase the communication time (broadcast, scatter, gather)

Cannon's algorithm--Example

Cannon's algorithm Still Implementing and debugging No result to share at present

Thank you Questions & Comments?