Presentation is loading. Please wait.

Presentation is loading. Please wait.

Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/2007 1. Quantitative.

Similar presentations


Presentation on theme: "Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/2007 1. Quantitative."— Presentation transcript:

1 Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra pmarques@dei.uc.pt Ago/2007 1. Quantitative Aspects

2 2 Let’s do Human Parallel Computing!

3 3 Simple Task What do you need? Organize into groups. E.g. - 1 group of 10 people - 2 groups of 5 people - 5 groups of 1 person 1 Pen and 1 piece of paper for all members of the group 1 Piece of paper for the group as a whole Objective: To compute as fast as possible a certain number of mathematical operations (e.g. 23x12) When the group is done, having all the results on a piece of paper, say “DONE”

4 4 GO 21 x 34= ___________ 1024 / 8= ___________ 53 x 12= ___________ 66 / 11= ___________ 6 x 5= ___________ 93 / 3= ___________ 89 x 12= ___________ 45 / 5= ___________ 91 x 10= ___________ 128 / 16= ___________ SUM = _____________

5 5 Basic Metrics Speedup Efficiency What was the speedup and efficiency of your human computation?

6 6 Amdahl's Law The speedup depends on the amount of code that cannot be parallelized: n: number of processors s: percentage of code that cannot be made parallel t s : time it takes to run the code serially

7 7 Amdahl's Law – The Bad News!

8 8 Efficiency Using 30 Processors

9 9 What Is That “s” Anyway? Three slides ago… “s: percentage of code that cannot be made parallel” Actually, it’s worse than that. Actually it’s the percentage of time that cannot be executed in parallel. It can be: Time spent communicating Time spent waiting for/sending jobs Time spent waiting for the completion of other processes Time spent calling the middleware for parallel programming Remember… if s is even as small as 0.05, the maximum speedup is only 20

10 10 Maximum Speedup If you have  processors this will be 0, so the maximum possible speedup is 1/s non-parallel (s)maximum speedup 0%  (linear speedup) 5%20 10%10 20%5 25%4

11 11 Load Balancing and Computation/Communication Load balancing is always a factor to consider when developing a parallel application. Too big granularity  Poor load balancing Too small granularity  Too much communication The ratio computation/communication is of crucial importance! time Work Wait Task 1 Task 2 Task 3

12 12 Granularity Granularity is related to the size of a process Coarse granularity  many sequential instructions per process Fine granularity  few sequential instructions per process Increasing granularity reduces the parallelism Ideal Goal: Design a parallel program in which it is easy to vary the granularity. This is called scalability in the Parallel Programming Book

13 13 Amdahl's Law “Amdahl’s Law Predicted the end Of Parallel Computing!”

14 14 But… How could this be when Amdahl's Law Predicted Otherwise? John L. Gustafson, 1988: “… very few problems will experience even a 100-fold speedup. Yet for three very practical applications (s = 0.4 - 0.8 percent) used at Sandia, we have achieved the speedup factors on a 1024-processor hypercube which we believe are unprecedented: 1021 for beam stress analysis using conjugate gradients 1020 for baffled surface wave simulation using finite differences 1016 for unstable fluid flow using flux-corrected transport.

15 15 Let’s understand it! Informally… “9 women cannot have a baby in 1 month, but they are able to have 9 babies in 9 months” Amdahl's assumes that “s” is fixed! It does not change when the problem changes (i.e. when the program is parallelized). Gustafson argues that “s” is not independent of “n”! When running bigger problems the serial and non-serial part do not scale equally! The problem size scales with the number of processors. Therefore: You can run bigger problems You can run several simultaneous jobs (you have more parallelism available)

16 16 Quoting Gustafson: «The expression and graph both contain the implicit assumption that p is independent of N, which is virtually never the case. One does not take a fixed-size problem and run it on various numbers of processors except when doing academic research; in practice, the problem size scales with the number of processors. When given a more powerful processor, the problem generally expands to make use of the increased facilities. Users have control over such things as grid resolution, number of time steps, difference operator complexity, and other parameters that are usually adjusted to allow the program to be run in some desired amount of time. Hence, it may be most realistic to assume that run time, not problem size, is constant.»

17 17 The meaning of “s” Fixed-Size Model Scaled-Size Model

18 18 Scaled Speedup (Gustafson-Barsis Law) Assume that a program after being parallelized spends s% in the serial part and p% on the parallel part, using N processors. On a serial machine it would take: s+p*N Since s+p=1 (unlimited speedup if we keep on adding resources)

19 19 Amdahl's Law vs. Gustafson-Barsis Law

20 20 Lessons “9 woman cannot have 9 babies in one month” (Amdahl's) “9 woman can have 9 babies in 9 months” (Gustafson-Barsis') “N woman can have N babies in 9 months” (Gustafson-Barsis‘ – Unlimited scaled speedup)

21 21 Exercises (1) A given program takes 100s to run in a single machine. When it is run in 10 machines it only takes 40s. What’s the speedup achieved? Is this a sublinear, linear or superlinear speedup? What is the efficiency that we are getting in this case? Could the time with 10 machines be 8s instead of 40s? How?

22 22 Exercises (2) A given program that was run in a single processor took 120 s in the serial part and 80 s in the parallel part What is the value of “s” according to the Law of Amdahl? What is the speedup that we can get with 10 processors? What is the maximum speedup that we can ever get?

23 23 Exercises (3) A given program that was run by 10 processors took 120s in the serial part and 80s in the parallel part What is the value of “s” according to the Law of Gustafson? What is the speedup that we are getting with 10 processors? What is the maximum speedup that we can ever get?


Download ppt "Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/2007 1. Quantitative."

Similar presentations


Ads by Google