Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Evaluation of Parallel Processing. Why Performance?

Similar presentations


Presentation on theme: "Performance Evaluation of Parallel Processing. Why Performance?"— Presentation transcript:

1 Performance Evaluation of Parallel Processing

2 Why Performance?

3 Models of Speedup Speedup Scaled Speedup ◦ Parallel processing gain over sequential processing, where problem size scales up with computing power (having sufficient workload/parallelism) Performance Evaluation of Parallel Processing

4 Speedup T s =time for the best serial algorithm T p =time for parallel algorithm using p processors

5 Example Processor 1 time 100 time 1 2 3 4 25 25 time 1 2 3 4 35 35 (a)(b) (c)

6 Example (cont.) time 1 2 3 4 30 20 40 10 time 1 2 3 4 50 50 (d) (e)

7 What Is “Good” Speedup? Linear speedup: Superlinear speedup Sub-linear speedup:

8 Speedup p speedup

9 Ideal Speedup in Multiprocessor System Linear Linear speedup ─ the execution time of program on an n-processor system would be l/nth of the execution time on a one-processor system

10 Limitations Interprocessor communication Synchronization Load Balancing

11 Limitations of Interprocessor communication Whenever one processor generates (computes) a value that is needed by the fraction of the program running on another processor, that value must be communicated to the processors that need it, which takes time On a uniprocessor system, the entire program runs on one processor, so there is no time lost to interprocessor communication

12 Limitations of Synchronization It is often necessary to synchronize the processors to ensure that they have all completed some phase of the program before any processor begins working on the next phase of the program

13 Load balancing In many parallel applications, difficult to divide the program across the processors When each processor working the same amount of time not possible, some of the processors complete their tasks early and are then idle waiting for the others to finish

14 Superlinear speedups Achieving speedup of greater than n on nprocessor systems Each of the processors in an n-processor multiprocessor to complete its fraction of the program in less than l/nth of the program’s execution time on a uniprocessor

15 Factors That Limit Speedup ● Software Overhead Even with a completely equivalent algorithm, software overhead ar ises in the concurrent implementation ● Load Balancing Speedup is generally limited by the speed of the slowest node. So an important consideration is to ensure that each node performs the same amount of work ● Communication Overhead Assuming that communication and calculation cannot be overlap ped, then any time spent communicating the data between processors directly degrades the speedup

16 CS546 Lect ure 5 Pag e 16 Degradations of Parallel Processing Unbalanced Workload Communication Delay Overhead Increases with the Ensemble Size

17 Degradations of Distributed Computing Unbalanced Computing Power and Workload Shared Computing and Communication Resource Uncertainty, Heterogeneity, and Overhead Increases with the Ensemble Size

18 Causes of Superlinear Speedup Cache size increased Overhead reduced Latency hidden Randomized algorithms Mathematical inefficiency of the serial algorithm Higher memory access cost in sequential processing X.H. Sun, and J. Zhu, "Performance Considerations of Shared Virtual Memory Machines,""Performance Considerations of Shared Virtual Memory Machines," IEEE Trans. on Parallel and Distributed Systems, Nov. 1995

19 Efficiency ● Speed up does not measure how efficiently the processors are being used ● Is it worth using 100 processors to get a speedup of 2? ● Efficiency is defined as the ratio of the speedup and the number of processors required to achieve it ● Efficiency is given by E(P,N) = S(P, N) / P

20 If the best known serial algorithm takes 8 seconds i.e. Ts = 8, while a parallel algorithm takes 2 seconds using 5 processors, then

21 Say we have a program containing 100 operations each of which take 1 time unit. If 80 operations can be done in parallel i.e. P = 80 and 20 operations must be done sequentially i.e. S = 20 then using 80 processors find speedup

22 Speedup metrics Three performance models based on three speedup metrics are commonly used.  Amdahl’s law -- Fixed problem size  Gustafson’s law -- Fixed time speedup  Sun-Ni’s law -- Memory Bounding speedup Three approaches to scalability analysis are based on Maintaining a constant efficiency, A constant speed, and A constant utilization

23 Amdahl’s Law The performance improvement that can be gained by a parallel implementation is limited by the fraction of time parallelism can actually be used in an application Let  = fraction of program (algorithm) that is serial and cannot be parallelized. For instance: ◦ Loop initialization ◦ Reading/writing to a single disk ◦ Procedure call overhead Parallel run time is given by CS546 Lect ure 5 Pag e 23

24 Amdahl’s Law Amdahl’s law gives a limit on speedup in terms of  CS546 Lect ure 5 Pag e 24

25 Fixed-Size Speedup (Amdahl Law, 67) Fixed-Size Speedup (Amdahl Law, 67) CS546 Lect ure 5 Pag e 25 WpWp W1W1 WpWp WpWp WpWp WpWp W1W1 W1W1 W1W1 W1W1 12345 Number of Processors (p) Amoun t of Work TpTp T1T1 TpTp TpTp TpTp T1T1 T1T1 TpTp T1T1 T1T1 12345 Number of Processors (p) Elapsed Time

26 Consider the effect of the serial fraction F on the speedup produced for N = 10 and N = 1024.

27

28 Comments on Amdahl’s Law The Amdahl’s fraction  in practice depends on the problem size n and the number of processors p An effective parallel algorithm has: For such a case, even if one fixes p, we can get linear speedups by choosing a suitable large problem size Scalable speedup Practically, the problem size that we can run for a particular problem is limited by the time and memory of the parallel computer CS546 Lect ure 5 Pag e 28

29 Gustafson law Gustafson defined two “more relevant” notions of speedup » Scaled speedup » Fixed-time speedup » And renamed Amdahl’s version as “fixed- size” speedup

30 Gustafson’s Law

31

32 Gustafson’s Law : Scaling for Higher Accuracy ? The problem size (workload) is fixed and cannot scale to match the available computing power as the machine size increases. Thus, Amdahl’s law leads to a diminishing return when a larger system is employed to solve a small problem. The sequential bottleneck in Amdahl’s law can be alleviated by removing the restriction of a fixed problem size. Gustafson’s proposed a fixed time concept that achieves an improved speedup by scaling problem size with the increase in machine size


Download ppt "Performance Evaluation of Parallel Processing. Why Performance?"

Similar presentations


Ads by Google