Steve Lantz Computing and Information Science 4205 www.cac.cornell.edu/~slantz 1 Parallel Performance Week 7 Lecture Notes.

Slides:



Advertisements
Similar presentations
Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6.
Advertisements

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Distributed Systems CS
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
Potential for parallel computers/parallel programming
TECH Computer Science Parallel Algorithms  several operations can be executed at the same time  many problems are most naturally modeled with parallelism.
1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.
Parallel System Performance CS 524 – High-Performance Computing.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
Example (1) Two computer systems have been tested using three benchmarks. Using the normalized ratio formula and the following tables below, find which.
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.
Chapter 7 Performance Analysis. 2 Additional References Selim Akl, “Parallel Computation: Models and Methods”, Prentice Hall, 1997, Updated online version.
CS 584. Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of.
1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Performance Metrics Parallel Computing - Theory and Practice (2/e) Section 3.6 Michael J. Quinn mcGraw-Hill, Inc., 1994.
Recap.
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Parallel System Performance CS 524 – High-Performance Computing.
Amdahl's Law.
Computer Science 320 Measuring Speedup. What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number.
Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
Performance Evaluation of Parallel Processing. Why Performance?
Chapter 7 Performance Analysis.
“elbowing out” Processors used Speedup Efficiency timeexecution Parallel Processors timeexecution Sequential Efficiency   
INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Amdahl's Law Validity of the single processor approach to achieving large scale computing capabilities Presented By: Mohinderpartap Salooja.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
Compiled by Maria Ramila Jimenez
Lecturer: Simon Winberg Lecture 18 Amdahl’s Law (+- 25 min)
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
Dean Tullsen UCSD.  The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive.
University of Washington What is parallel processing? Spring 2014 Wrap-up When can we execute things in parallel? Parallelism: Use extra resources to solve.
Parallel Programming with MPI and OpenMP
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
Advanced Computer Networks Lecture 1 - Parallelization 1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.
Concurrency and Performance Based on slides by Henri Casanova.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Classification of parallel computers Limitations of parallel processing.
Distributed and Parallel Processing George Wells.
Potential for parallel computers/parallel programming
Introduction to Parallelism.
Parallel Processing Sharing the load.
Quiz Questions Parallel Programming Parallel Computing Potential
Amdahl's law.
CS 584.
By Brandon, Ben, and Lee Parallel Computing.
Ratios involving complex fractions
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Parallel Algorithms A Simple Model for Parallel Processing
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Complexity Measures for Parallel Computation
Potential for parallel computers/parallel programming
Quiz Questions Parallel Programming Parallel Computing Potential
Quiz Questions Parallel Programming Parallel Computing Potential
Quiz Questions Parallel Programming Parallel Computing Potential
Potential for parallel computers/parallel programming
Presentation transcript:

Steve Lantz Computing and Information Science Parallel Performance Week 7 Lecture Notes

Steve Lantz Computing and Information Science Parallel Speedup Parallel speedup: how much faster does my code run in parallel compared to serial? Max value is p, the number of processors Parallel speedup = serial execution time parallel execution time

Steve Lantz Computing and Information Science Parallel Efficiency Parallel efficiency: how much faster does my code run in parallel compared to linear speedup from serial? Max value is 1 Parallel efficiency = Parallel speedup p

Steve Lantz Computing and Information Science Model of Parallel Execution Time Let n represent the problem size and p be the number of processors Execution time of the inherently sequential portion of the code is  (n) Execution time of the parallelizable portion of the code  (n) / p Overhead is due parallel communication, synchronization is  (n,p) 4 T(n,p) =  (n) +  (n) / p +  (n,p)

Steve Lantz Computing and Information Science Model of Parallel Speedup Divide T(n,1) by T(n,p) Cute notation by Quinn:  (or Greek psi) = parallel speedup  (n) +  (n)  (n,p) =  (n) +  (n) / p +  (n,p)

Steve Lantz Computing and Information Science Amdahl’s Law Parallel overhead  is always positive, so… Can express it in terms of sequential fraction f =  (n) / [  (n) +  (n)] Divide numerator, denominator by [  (n) +  (n)]  (n) +  (n)  (n,p) <  (n) +  (n) / p

Steve Lantz Computing and Information Science Other Expressions for Amdahl’s Law 1  (n,p) < f + (1 - f ) / p p  (n,p) < 1 + f (p - 1)

Steve Lantz Computing and Information Science Parallel Overhead: Synchronization (Example) Suppose the parallelizable part of the code consists of n tasks each taking the same time t What if n is not divisible by p? Let x = mod(n,p) = the “extra” tasks Sequential execution time: t n = t (n - x) + t x For parallel execution, only the first term shrinks inversely with p For p processors, the second term takes either 0 or t (because x < p) Therefore, time on p processors = t (n - x) / p + t {1 -  0,x } T(n,p) =  (n) +  (n) / p +  (n,p)  (n) = t [n – mod(n,p)],  (n,p) = t {1 -  0,mod(n,p) }

Steve Lantz Computing and Information Science Aside: How Do You Code a Kronecker Delta in C? Simple once you see it… not so simple to come up with it… Formula assumes 0 <= x < p Kron(0,x) = 1 - (x + (p-x)%p)/p)

Steve Lantz Computing and Information Science Parallel Overhead: Jitter (Example)

Steve Lantz Computing and Information Science Communication Latency Bandwidth