Parallelism and Amdahl's Law

Slides:



Advertisements
Similar presentations
Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6.
Advertisements

Distributed Systems CS
Potential for parallel computers/parallel programming
CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
Parallel Processors Todd Charlton Eric Uriostique.
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.
Reference: Message Passing Fundamentals.
1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative.
Recap.
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Chapter 9. Concepts in Parallelisation An Introduction
Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven / Aachen, Germany Stand: Version 2.3.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Performance Evaluation of Parallel Processing. Why Performance?
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note Introduction to Parallel Computing (Blaise Barney,
Amdahl's Law Validity of the single processor approach to achieving large scale computing capabilities Presented By: Mohinderpartap Salooja.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
TM Parallel Concepts An introduction. TM The Goal of Parallelization Reduction of elapsed time of a program Reduction in turnaround time of jobs Overhead:
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.
Concurrency and Performance Based on slides by Henri Casanova.
CS 221 – May 22 Timing (sections 2.6 and 3.6) Speedup Amdahl’s law – What happens if you can’t parallelize everything Complexity Commands to put in your.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
Mergesort example: Merge as we return from recursive calls Merge Divide 1 element 829.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Potential for parallel computers/parallel programming
Parallel Computing and Parallel Computers
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
18-447: Computer Architecture Lecture 30B: Multiprocessors
ChaNGa: Design Issues in High Performance Cosmology
Parallel Software Development with Intel Threading Analysis Tools
Parallel Computing Lecture
Introduction to parallel algorithms
Introduction to Parallelism.
EE 193: Parallel Computing
Chapter 3: Principles of Scalable Performance
Complexity Measures for Parallel Computation
Summary Background Introduction in algorithms and applications
Lecture 3 : Performance of Parallel Programs
Introduction to parallel algorithms
CSE8380 Parallel and Distributed Processing Presentation
CS 584.
Distributed Systems CS
Parallelismo.
By Brandon, Ben, and Lee Parallel Computing.
Multithreaded Programming in Cilk Lecture 1
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Parallel Computing and Parallel Computers
Mattan Erez The University of Texas at Austin
Process Synchronization
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Parallel Programming Fundamentals
Complexity Measures for Parallel Computation
The University of Adelaide, School of Computer Science
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Introduction to parallel algorithms
Chapter 2 Parallel Programming background
Presentation transcript:

Parallelism and Amdahl's Law Eric Shook Department of Geography Kent State University

Parallel computing Image sources: intel.com, http://www.nasa.gov/audience/foreducators/k-4/features/F_ESSEA_Course_K-4.html

Inter-process Communication Shared Memory Message Passing Processing Core Processing Core 1 Processing Core Processing Core 1 [40.742, - 74.245] [40.742, -74.245] [40.742, -74.245] Private memory space for processing core 0 Private memory space for processing core 1 Memory space is shared between processing core 0 and 1

Parallel Programming Paradigms Functional Parallelism Data Parallelism Processing Core Processing Core 1 Processing Core Processing Core 1 Task A Task B Task A Task A Data (Half) Equivalent processing times Data (Half) Task B Task B Data Data Data (Half) Data (Half)

Spatial Domain Decomposition Row or Column Quadtree Recursive Bisection Grid Ding, Y., & Densham, P. J. (1996). Spatial strategies for parallel spatial modelling. International Journal of Geographical Information Systems, 10(6), 669-698.

Challenges for Parallelism: Load-Imbalance Uneven amount of data for processing Processing Core Processing Core 1 Core 0 will finish processing much sooner than Core 1 Task A Task A

Load-Imbalance: Bad for Performance Imbalanced Workload Balanced Workload 20% 80% 50% 50%

Load-Imbalance: Bad for Performance Imbalanced Workload Balanced Workload 20% 80% 50% 50% Doing nothing, but could be processing Overloaded core All lost time due to imbalance

Challenges: Not Enough Parallelism Task A Task B Task C Task D Task E Not Enough Task Parallelism Data too small for Data Parallelism

Measuring Parallel Performance: Speedup Speedup is commonly used to assess the performance of a parallel program. Speedup is defined as the execution time on a single core (T1) over the execution time on p cores (Tp) (Amdahl, 1967). Linear or ideal speedup is reached when Sp = p. Linear Speedup Actual Speedup Speedup Number of cores

Amdahl's Law: Theoretical Speedup Serial Portion Task A Task B Parallel Portion Task C Task D Task E Assume P is the parallel portion of a parallel program, then (1-P) is the portion that cannot be made parallel (serial portion). Amdahl's law states that the maximum speedup on N processors is: 1 (1-P) + S(N) = P N

As N tends to infinity, S(N) tends to 1/(1-P) Amdahl's Law: Examples 1 (1-P) + As N tends to infinity, S(N) tends to 1/(1-P) S(N) = P N Parallel Portion Maximum Speedup* 99% 100 95% 20 90% 10 75% 4 50% 2 25% 1.3 * Even if we have one million processing cores!