Parallelism and Amdahl's Law

Slides:

Advertisements

Similar presentations

Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6.

Advertisements

Distributed Systems CS

Potential for parallel computers/parallel programming

CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.

Parallel Processors Todd Charlton Eric Uriostique.

An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu

GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.

Reference: Message Passing Fundamentals.

1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.

Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative.

Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.

Chapter 9. Concepts in Parallelisation An Introduction

Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven / Aachen, Germany Stand: Version 2.3.

Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.

Performance Evaluation of Parallel Processing. Why Performance?

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note Introduction to Parallel Computing (Blaise Barney,

Amdahl's Law Validity of the single processor approach to achieving large scale computing capabilities Presented By: Mohinderpartap Salooja.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.

Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

TM Parallel Concepts An introduction. TM The Goal of Parallelization Reduction of elapsed time of a program Reduction in turnaround time of jobs Overhead:

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.

Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.

Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.

Concurrency and Performance Based on slides by Henri Casanova.

CS 221 – May 22 Timing (sections 2.6 and 3.6) Speedup Amdahl’s law – What happens if you can’t parallelize everything Complexity Commands to put in your.

Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.

First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.

Mergesort example: Merge as we return from recursive calls Merge Divide 1 element 829.

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

Potential for parallel computers/parallel programming

Parallel Computing and Parallel Computers

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

18-447: Computer Architecture Lecture 30B: Multiprocessors

ChaNGa: Design Issues in High Performance Cosmology

Parallel Software Development with Intel Threading Analysis Tools

Parallel Computing Lecture

Introduction to parallel algorithms

Introduction to Parallelism.

EE 193: Parallel Computing

Chapter 3: Principles of Scalable Performance

Complexity Measures for Parallel Computation

Summary Background Introduction in algorithms and applications

Lecture 3 : Performance of Parallel Programs

Introduction to parallel algorithms

CSE8380 Parallel and Distributed Processing Presentation

Distributed Systems CS

By Brandon, Ben, and Lee Parallel Computing.

Multithreaded Programming in Cilk Lecture 1

PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.

Parallel Computing and Parallel Computers

Mattan Erez The University of Texas at Austin

Process Synchronization

Potential for parallel computers/parallel programming

Potential for parallel computers/parallel programming

Parallel Programming Fundamentals

Complexity Measures for Parallel Computation

The University of Adelaide, School of Computer Science

Potential for parallel computers/parallel programming

Potential for parallel computers/parallel programming

Introduction to parallel algorithms

Chapter 2 Parallel Programming background

Presentation transcript:

Parallelism and Amdahl's Law Eric Shook Department of Geography Kent State University

Parallel computing Image sources: intel.com, http://www.nasa.gov/audience/foreducators/k-4/features/F_ESSEA_Course_K-4.html

Inter-process Communication Shared Memory Message Passing Processing Core Processing Core 1 Processing Core Processing Core 1 [40.742, - 74.245] [40.742, -74.245] [40.742, -74.245] Private memory space for processing core 0 Private memory space for processing core 1 Memory space is shared between processing core 0 and 1

Parallel Programming Paradigms Functional Parallelism Data Parallelism Processing Core Processing Core 1 Processing Core Processing Core 1 Task A Task B Task A Task A Data (Half) Equivalent processing times Data (Half) Task B Task B Data Data Data (Half) Data (Half)

Spatial Domain Decomposition Row or Column Quadtree Recursive Bisection Grid Ding, Y., & Densham, P. J. (1996). Spatial strategies for parallel spatial modelling. International Journal of Geographical Information Systems, 10(6), 669-698.

Challenges for Parallelism: Load-Imbalance Uneven amount of data for processing Processing Core Processing Core 1 Core 0 will finish processing much sooner than Core 1 Task A Task A

Load-Imbalance: Bad for Performance Imbalanced Workload Balanced Workload 20% 80% 50% 50%

Load-Imbalance: Bad for Performance Imbalanced Workload Balanced Workload 20% 80% 50% 50% Doing nothing, but could be processing Overloaded core All lost time due to imbalance

Challenges: Not Enough Parallelism Task A Task B Task C Task D Task E Not Enough Task Parallelism Data too small for Data Parallelism

Measuring Parallel Performance: Speedup Speedup is commonly used to assess the performance of a parallel program. Speedup is defined as the execution time on a single core (T1) over the execution time on p cores (Tp) (Amdahl, 1967). Linear or ideal speedup is reached when Sp = p. Linear Speedup Actual Speedup Speedup Number of cores

Amdahl's Law: Theoretical Speedup Serial Portion Task A Task B Parallel Portion Task C Task D Task E Assume P is the parallel portion of a parallel program, then (1-P) is the portion that cannot be made parallel (serial portion). Amdahl's law states that the maximum speedup on N processors is: 1 (1-P) + S(N) = P N

As N tends to infinity, S(N) tends to 1/(1-P) Amdahl's Law: Examples 1 (1-P) + As N tends to infinity, S(N) tends to 1/(1-P) S(N) = P N Parallel Portion Maximum Speedup* 99% 100 95% 20 90% 10 75% 4 50% 2 25% 1.3 * Even if we have one million processing cores!