11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.

11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics for Parallel Systems Formulating Maximum Speedup: Amdahl’s Law Scalability of Parallel Systems Review of Amdahl’s Law: Gustafson-Barsis’ Law

22Sahalu JunaiduICS 573: High Performance Computing5.2 Analytical Modeling - Basics A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size). The asymptotic runtime of a sequential program is identical on any serial platform. On the other hand, the parallel runtime of a program depends on –the input size, –the number of processors, and –the communication parameters of the machine. An algorithm must therefore be analyzed in the context of the underlying platform. A parallel system is a combination of a parallel algorithm and an underlying platform.

33Sahalu JunaiduICS 573: High Performance Computing5.3 Sources of Overhead in Parallel Programs If I use n processors to run my program, would it run n times faster? Overheads! –Interprocessor Communication & Interactions Usually the most significant source of overhead –Idling Load imbalance, Synchronization, Serial components –Excess Computation Sub-optimal serial algorithm More aggregate computations Goal is to minimize these overheads!

44Sahalu JunaiduICS 573: High Performance Computing5.4 Performance Metrics for Parallel Programs Why analyze the performance of parallel programs? –Determine the best algorithm –Examine the benefit of parallelism A number of metrics have been used based on the desired outcome of performance analysis: –Execution time –Total parallel overhead –Speedup –Efficiency –Cost

55Sahalu JunaiduICS 573: High Performance Computing5.5 Performance Metrics for Parallel Programs Parallel Execution Time –Time spent to solve a problem on p processors. T p Total Overhead Function –T o = pT p – T s Speedup –S = T s /T p –Can we have superlinear speedup? exploratory computations, hardware features Efficiency –E = S/p Cost –pT p (processor-time product)

66Sahalu JunaiduICS 573: High Performance Computing5.6 Performance Metrics: Working Example

77Sahalu JunaiduICS 573: High Performance Computing5.7 Performance Metrics: Example on Speedup What is the benefit from parallelism? Consider the problem of adding n numbers by using n processing elements. If n is a power of two, we can perform this operation in log n steps by propagating partial sums up a logical binary tree of processors. If an addition takes constant time, say, t c and communication of a single word takes time t s + t w, the parallel time is T P = Θ (log n) We know that T S = Θ (n) Speedup S is given by S = Θ (n / log n)

88Sahalu JunaiduICS 573: High Performance Computing5.8 Performance Metrics: Speedup Bounds For computing speedup, the best sequential program is taken as the baseline. –There may be different sequential algorithms with different asymptotic runtimes for a given problem Speedup can be as low as 0 (the parallel program never terminates). Speedup, in theory, should be upper bounded by p. In practice, a speedup greater than p is possible. –This is known as superlinear speedup Superlinear speedup can result –when a serial algorithm does more computations than its parallel formulation –due to hardware features that put the serial implementation at a disadvantage Note that superlinear speed happens only if each processing element spends less than time T S /p solving the problem.

99Sahalu JunaiduICS 573: High Performance Computing5.9 Performance Metrics: Superlinear Speedups Superlinearity effect due to exploratory decomposition

10 Sahalu JunaiduICS 573: High Performance Computing5.10 Cost of a Parallel System As shown earlier, Cost is the product of parallel runtime and the number of processing elements used ( pT P ). Cost reflects the sum of the time that each processing element spends solving the problem. A parallel system is said to be cost-optimal if the cost of solving a problem on a parallel computer is asymptotically identical to serial cost. Since E = T S / p T P, for cost optimal systems, E = O(1). Cost is sometimes referred to as work or processor-time product. The problem of adding n numbers on n processors is not cost- optimal.

11 Sahalu JunaiduICS 573: High Performance Computing5.11 Formulating Maximum Speedup Assume an algorithm has some sequential parts that are only executed on one processor. Assume the fraction of the computation that cannot be divided into concurrent tasks is f. Assume no overhead incurs when the computation is divided into concurrent parts. The time to perform the computation with p processors is: Hence, the speedup factor is (Amdahl’s Law):

12 Sahalu JunaiduICS 573: High Performance Computing5.12 Visualizing Amdahl’s law

13 Sahalu JunaiduICS 573: High Performance Computing5.13 Speedup Against Number of Processors

14 Sahalu JunaiduICS 573: High Performance Computing5.14 Speedup against number of processors From the preceding formulation, f has to be a small fraction of the overall computation if significant increase in speedup is to occur Even with infinite number of processors, maximum speedup limited to 1/f : Example: With only 5% of computation being serial, maximum speedup is 20, irrespective of number of processors. Amdahl used this argument to promote single processor machines

15 Sahalu JunaiduICS 573: High Performance Computing5.15 Speedup and efficiency are relative terms. They depend on –Number of processors –Problem size –The algorithm used For example, efficiency of a parallel program often decreases as the number of processors increases Similarly, a parallel program may be quite efficient for solving large problems, but not for solving small problems A parallel program is said to scale if its efficiency is constant for a broad range of number of processors and problem sizes Finally, speedup and efficiency depend on the algorithm used. –A parallel program might be efficient relative to one sequential algorithm but not relative to a different sequential algorithm Scalability

16 Sahalu JunaiduICS 573: High Performance Computing5.16 Presented an argument based upon scalability concepts. –To show that Amdahl’s law was not as significant as first supposed in limiting the potential speedup. Observation: In practice a larger multiprocessor usually allows a larger size of the problem to be undertaken in a reasonable execution time. Hence, the problem size is not independent of the number of processors. Rather than assume the problem size is fixed, we should assume that the parallel execution time is fixed. Using the parallel constant execution time constraint, the resulting speedup factor will be numerically different from Amdahl’s speedup factor and is called a scaled speedup factor Gustafson’s Law

17 Sahalu JunaiduICS 573: High Performance Computing5.17 Speedup vs Number of Processors

18 Sahalu JunaiduICS 573: High Performance Computing5.18 Speedup vs Number of Processors

19 Sahalu JunaiduICS 573: High Performance Computing5.19 Assuming the parallel execution time, T p, is normalized to unity: Assuming that in the serial execution time, T s, below, fT s is a constant, Then the scaled speedup factor (Gustafson’s Law) is: Formulating Gustafson’s Law

11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.

Similar presentations

Presentation on theme: "11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.

Similar presentations

Presentation on theme: "11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics."— Presentation transcript:

Similar presentations

About project

Feedback