Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/2007 1. Quantitative.

Slides:



Advertisements
Similar presentations
Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6.
Advertisements

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Potential for parallel computers/parallel programming
CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.
Parallel System Performance CS 524 – High-Performance Computing.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.
CS 584. Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Programming.
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Steve Lantz Computing and Information Science Parallel Performance Week 7 Lecture Notes.
Parallel System Performance CS 524 – High-Performance Computing.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Virtues of Good (Parallel) Software
Performance of Parallel Programs Michelle Kuttel 1.
Computer Science 320 Measuring Speedup. What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
Performance Evaluation of Parallel Processing. Why Performance?
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
“elbowing out” Processors used Speedup Efficiency timeexecution Parallel Processors timeexecution Sequential Efficiency   
INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.
Combining the strengths of UMIST and The Victoria University of Manchester COMP60611 Fundamentals of Parallel and Distributed Systems Lecture 7 Scalability.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Amdahl's Law Validity of the single processor approach to achieving large scale computing capabilities Presented By: Mohinderpartap Salooja.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Parallel Programming with MPI and OpenMP
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Advanced Computer Networks Lecture 1 - Parallelization 1.
Outline Why this subject? What is High Performance Computing?
27-Jan-16 Analysis of Algorithms. 2 Time and space To analyze an algorithm means: developing a formula for predicting how fast an algorithm is, based.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.
Concurrency and Performance Based on slides by Henri Casanova.
1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
Classification of parallel computers Limitations of parallel processing.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
DCS/1 CENG Distributed Computing Systems Measures of Performance.
Potential for parallel computers/parallel programming
PERFORMANCE EVALUATIONS
Introduction to Parallelism.
EE 193: Parallel Computing
Chapter 3: Principles of Scalable Performance
CSE8380 Parallel and Distributed Processing Presentation
CS 584.
Performance Cycle time of a computer CPU speed speed = 1 / cycle time
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Parallelism and Amdahl's Law
Parallel Computing and Parallel Computers
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Parallel Speedup.
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Presentation transcript:

Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative Aspects

2 Let’s do Human Parallel Computing!

3 Simple Task What do you need? Organize into groups. E.g. - 1 group of 10 people - 2 groups of 5 people - 5 groups of 1 person 1 Pen and 1 piece of paper for all members of the group 1 Piece of paper for the group as a whole Objective: To compute as fast as possible a certain number of mathematical operations (e.g. 23x12) When the group is done, having all the results on a piece of paper, say “DONE”

4 GO 21 x 34= ___________ 1024 / 8= ___________ 53 x 12= ___________ 66 / 11= ___________ 6 x 5= ___________ 93 / 3= ___________ 89 x 12= ___________ 45 / 5= ___________ 91 x 10= ___________ 128 / 16= ___________ SUM = _____________

5 Basic Metrics Speedup Efficiency What was the speedup and efficiency of your human computation?

6 Amdahl's Law The speedup depends on the amount of code that cannot be parallelized: n: number of processors s: percentage of code that cannot be made parallel t s : time it takes to run the code serially

7 Amdahl's Law – The Bad News!

8 Efficiency Using 30 Processors

9 What Is That “s” Anyway? Three slides ago… “s: percentage of code that cannot be made parallel” Actually, it’s worse than that. Actually it’s the percentage of time that cannot be executed in parallel. It can be: Time spent communicating Time spent waiting for/sending jobs Time spent waiting for the completion of other processes Time spent calling the middleware for parallel programming Remember… if s is even as small as 0.05, the maximum speedup is only 20

10 Maximum Speedup If you have  processors this will be 0, so the maximum possible speedup is 1/s non-parallel (s)maximum speedup 0%  (linear speedup) 5%20 10%10 20%5 25%4

11 Load Balancing and Computation/Communication Load balancing is always a factor to consider when developing a parallel application. Too big granularity  Poor load balancing Too small granularity  Too much communication The ratio computation/communication is of crucial importance! time Work Wait Task 1 Task 2 Task 3

12 Granularity Granularity is related to the size of a process Coarse granularity  many sequential instructions per process Fine granularity  few sequential instructions per process Increasing granularity reduces the parallelism Ideal Goal: Design a parallel program in which it is easy to vary the granularity. This is called scalability in the Parallel Programming Book

13 Amdahl's Law “Amdahl’s Law Predicted the end Of Parallel Computing!”

14 But… How could this be when Amdahl's Law Predicted Otherwise? John L. Gustafson, 1988: “… very few problems will experience even a 100-fold speedup. Yet for three very practical applications (s = percent) used at Sandia, we have achieved the speedup factors on a 1024-processor hypercube which we believe are unprecedented: 1021 for beam stress analysis using conjugate gradients 1020 for baffled surface wave simulation using finite differences 1016 for unstable fluid flow using flux-corrected transport.

15 Let’s understand it! Informally… “9 women cannot have a baby in 1 month, but they are able to have 9 babies in 9 months” Amdahl's assumes that “s” is fixed! It does not change when the problem changes (i.e. when the program is parallelized). Gustafson argues that “s” is not independent of “n”! When running bigger problems the serial and non-serial part do not scale equally! The problem size scales with the number of processors. Therefore: You can run bigger problems You can run several simultaneous jobs (you have more parallelism available)

16 Quoting Gustafson: «The expression and graph both contain the implicit assumption that p is independent of N, which is virtually never the case. One does not take a fixed-size problem and run it on various numbers of processors except when doing academic research; in practice, the problem size scales with the number of processors. When given a more powerful processor, the problem generally expands to make use of the increased facilities. Users have control over such things as grid resolution, number of time steps, difference operator complexity, and other parameters that are usually adjusted to allow the program to be run in some desired amount of time. Hence, it may be most realistic to assume that run time, not problem size, is constant.»

17 The meaning of “s” Fixed-Size Model Scaled-Size Model

18 Scaled Speedup (Gustafson-Barsis Law) Assume that a program after being parallelized spends s% in the serial part and p% on the parallel part, using N processors. On a serial machine it would take: s+p*N Since s+p=1 (unlimited speedup if we keep on adding resources)

19 Amdahl's Law vs. Gustafson-Barsis Law

20 Lessons “9 woman cannot have 9 babies in one month” (Amdahl's) “9 woman can have 9 babies in 9 months” (Gustafson-Barsis') “N woman can have N babies in 9 months” (Gustafson-Barsis‘ – Unlimited scaled speedup)

21 Exercises (1) A given program takes 100s to run in a single machine. When it is run in 10 machines it only takes 40s. What’s the speedup achieved? Is this a sublinear, linear or superlinear speedup? What is the efficiency that we are getting in this case? Could the time with 10 machines be 8s instead of 40s? How?

22 Exercises (2) A given program that was run in a single processor took 120 s in the serial part and 80 s in the parallel part What is the value of “s” according to the Law of Amdahl? What is the speedup that we can get with 10 processors? What is the maximum speedup that we can ever get?

23 Exercises (3) A given program that was run by 10 processors took 120s in the serial part and 80s in the parallel part What is the value of “s” according to the Law of Gustafson? What is the speedup that we are getting with 10 processors? What is the maximum speedup that we can ever get?