11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.

Slides:



Advertisements
Similar presentations
Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6.
Advertisements

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Distributed Systems CS
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
Lecture 10: Performance Metrics Shantanu Dutt ECE Dept. UIC.
Potential for parallel computers/parallel programming
1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.
Parallel System Performance CS 524 – High-Performance Computing.
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.
Chapter 7 Performance Analysis. 2 Additional References Selim Akl, “Parallel Computation: Models and Methods”, Prentice Hall, 1997, Updated online version.
CS 584. Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative.
Chapter 7 Performance Analysis. 2 References (Primary Reference): Selim Akl, “Parallel Computation: Models and Methods”, Prentice Hall, 1997, Updated.
Analytical Modeling of Parallel Systems Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'',
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Parallel Programming Chapter 3 Introduction to Parallel Architectures Johnnie Baker January 26 , 2011.
Parallel System Performance CS 524 – High-Performance Computing.
Virtues of Good (Parallel) Software
CS526 Lecture 2.  The role of parallelism in accelerating computing speeds has been recognized for several decades.  Its role in providing multiplicity.
2a.1 Evaluating Parallel Programs Cluster Computing, UNC-Charlotte, B. Wilkinson.
CS 420 Design of Algorithms Analytical Models of Parallel Algorithms.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
Performance Evaluation of Parallel Processing. Why Performance?
Chapter 7 Performance Analysis.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
“elbowing out” Processors used Speedup Efficiency timeexecution Parallel Processors timeexecution Sequential Efficiency   
INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
Sept COMP60611 Fundamentals of Concurrency Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley,
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Parallel Programming with MPI and OpenMP
Classic Model of Parallel Processing
A System Performance Model Distributed Process Scheduling.
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.
Concurrency and Performance Based on slides by Henri Casanova.
1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Classification of parallel computers Limitations of parallel processing.
Mergesort example: Merge as we return from recursive calls Merge Divide 1 element 829.
Potential for parallel computers/parallel programming
Parallel Computing and Parallel Computers
PERFORMANCE EVALUATIONS
4- Performance Analysis of Parallel Programs
EE 193: Parallel Computing
Chapter 3: Principles of Scalable Performance
Part 2: Parallel Models (II)
CSE8380 Parallel and Distributed Processing Presentation
CS 584.
COMP60621 Fundamentals of Parallel and Distributed Systems
Analytical Modeling of Parallel Systems
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Parallel Computing and Parallel Computers
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Analytical Modeling of Parallel Systems
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
COMP60611 Fundamentals of Parallel and Distributed Systems
Presentation transcript:

11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics for Parallel Systems Formulating Maximum Speedup: Amdahl’s Law Scalability of Parallel Systems Review of Amdahl’s Law: Gustafson-Barsis’ Law

22Sahalu JunaiduICS 573: High Performance Computing5.2 Analytical Modeling - Basics A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size). The asymptotic runtime of a sequential program is identical on any serial platform. On the other hand, the parallel runtime of a program depends on –the input size, –the number of processors, and –the communication parameters of the machine. An algorithm must therefore be analyzed in the context of the underlying platform. A parallel system is a combination of a parallel algorithm and an underlying platform.

33Sahalu JunaiduICS 573: High Performance Computing5.3 Sources of Overhead in Parallel Programs If I use n processors to run my program, would it run n times faster? Overheads! –Interprocessor Communication & Interactions Usually the most significant source of overhead –Idling Load imbalance, Synchronization, Serial components –Excess Computation Sub-optimal serial algorithm More aggregate computations Goal is to minimize these overheads!

44Sahalu JunaiduICS 573: High Performance Computing5.4 Performance Metrics for Parallel Programs Why analyze the performance of parallel programs? –Determine the best algorithm –Examine the benefit of parallelism A number of metrics have been used based on the desired outcome of performance analysis: –Execution time –Total parallel overhead –Speedup –Efficiency –Cost

55Sahalu JunaiduICS 573: High Performance Computing5.5 Performance Metrics for Parallel Programs Parallel Execution Time –Time spent to solve a problem on p processors. T p Total Overhead Function –T o = pT p – T s Speedup –S = T s /T p –Can we have superlinear speedup? exploratory computations, hardware features Efficiency –E = S/p Cost –pT p (processor-time product)

66Sahalu JunaiduICS 573: High Performance Computing5.6 Performance Metrics: Working Example

77Sahalu JunaiduICS 573: High Performance Computing5.7 Performance Metrics: Example on Speedup What is the benefit from parallelism? Consider the problem of adding n numbers by using n processing elements. If n is a power of two, we can perform this operation in log n steps by propagating partial sums up a logical binary tree of processors. If an addition takes constant time, say, t c and communication of a single word takes time t s + t w, the parallel time is T P = Θ (log n) We know that T S = Θ (n) Speedup S is given by S = Θ (n / log n)

88Sahalu JunaiduICS 573: High Performance Computing5.8 Performance Metrics: Speedup Bounds For computing speedup, the best sequential program is taken as the baseline. –There may be different sequential algorithms with different asymptotic runtimes for a given problem Speedup can be as low as 0 (the parallel program never terminates). Speedup, in theory, should be upper bounded by p. In practice, a speedup greater than p is possible. –This is known as superlinear speedup Superlinear speedup can result –when a serial algorithm does more computations than its parallel formulation –due to hardware features that put the serial implementation at a disadvantage Note that superlinear speed happens only if each processing element spends less than time T S /p solving the problem.

99Sahalu JunaiduICS 573: High Performance Computing5.9 Performance Metrics: Superlinear Speedups Superlinearity effect due to exploratory decomposition

10 Sahalu JunaiduICS 573: High Performance Computing5.10 Cost of a Parallel System As shown earlier, Cost is the product of parallel runtime and the number of processing elements used ( pT P ). Cost reflects the sum of the time that each processing element spends solving the problem. A parallel system is said to be cost-optimal if the cost of solving a problem on a parallel computer is asymptotically identical to serial cost. Since E = T S / p T P, for cost optimal systems, E = O(1). Cost is sometimes referred to as work or processor-time product. The problem of adding n numbers on n processors is not cost- optimal.

11 Sahalu JunaiduICS 573: High Performance Computing5.11 Formulating Maximum Speedup Assume an algorithm has some sequential parts that are only executed on one processor. Assume the fraction of the computation that cannot be divided into concurrent tasks is f. Assume no overhead incurs when the computation is divided into concurrent parts. The time to perform the computation with p processors is: Hence, the speedup factor is (Amdahl’s Law):

12 Sahalu JunaiduICS 573: High Performance Computing5.12 Visualizing Amdahl’s law

13 Sahalu JunaiduICS 573: High Performance Computing5.13 Speedup Against Number of Processors

14 Sahalu JunaiduICS 573: High Performance Computing5.14 Speedup against number of processors From the preceding formulation, f has to be a small fraction of the overall computation if significant increase in speedup is to occur Even with infinite number of processors, maximum speedup limited to 1/f : Example: With only 5% of computation being serial, maximum speedup is 20, irrespective of number of processors. Amdahl used this argument to promote single processor machines

15 Sahalu JunaiduICS 573: High Performance Computing5.15 Speedup and efficiency are relative terms. They depend on –Number of processors –Problem size –The algorithm used For example, efficiency of a parallel program often decreases as the number of processors increases Similarly, a parallel program may be quite efficient for solving large problems, but not for solving small problems A parallel program is said to scale if its efficiency is constant for a broad range of number of processors and problem sizes Finally, speedup and efficiency depend on the algorithm used. –A parallel program might be efficient relative to one sequential algorithm but not relative to a different sequential algorithm Scalability

16 Sahalu JunaiduICS 573: High Performance Computing5.16 Presented an argument based upon scalability concepts. –To show that Amdahl’s law was not as significant as first supposed in limiting the potential speedup. Observation: In practice a larger multiprocessor usually allows a larger size of the problem to be undertaken in a reasonable execution time. Hence, the problem size is not independent of the number of processors. Rather than assume the problem size is fixed, we should assume that the parallel execution time is fixed. Using the parallel constant execution time constraint, the resulting speedup factor will be numerically different from Amdahl’s speedup factor and is called a scaled speedup factor Gustafson’s Law

17 Sahalu JunaiduICS 573: High Performance Computing5.17 Speedup vs Number of Processors

18 Sahalu JunaiduICS 573: High Performance Computing5.18 Speedup vs Number of Processors

19 Sahalu JunaiduICS 573: High Performance Computing5.19 Assuming the parallel execution time, T p, is normalized to unity: Assuming that in the serial execution time, T s, below, fT s is a constant, Then the scaled speedup factor (Gustafson’s Law) is: Formulating Gustafson’s Law