Combining the strengths of UMIST and The Victoria University of Manchester COMP60611 Fundamentals of Parallel and Distributed Systems Lecture 7 Scalability.

Slides:



Advertisements
Similar presentations
Program Efficiency & Complexity Analysis
Advertisements

CSE115/ENGR160 Discrete Mathematics 03/01/12
Analyzing Algorithms and Problems Prof. Sin-Min Lee Department of Computer Science.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
The Growth of Functions
Asymptotic Growth Rate
Cmpt-225 Algorithm Efficiency.
25 June 2015Comp 122, Spring 2004 Asymptotic Notation, Review of Functions & Summations.
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Analysis of Algorithms CPS212 Gordon College. Measuring the efficiency of algorithms There are 2 algorithms: algo1 and algo2 that produce the same results.
February 17, 2015Applied Discrete Mathematics Week 3: Algorithms 1 Double Summations Table 2 in 4 th Edition: Section th Edition: Section th.
1 Complexity Lecture Ref. Handout p
Algorithmic Complexity: Complexity Analysis of Time Complexity Complexities Nate the Great.
CS 420 Design of Algorithms Analytical Models of Parallel Algorithms.
Program Performance & Asymptotic Notations CSE, POSTECH.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
CSC 201 Analysis and Design of Algorithms Lecture 03: Introduction to a CSC 201 Analysis and Design of Algorithms Lecture 03: Introduction to a lgorithms.
Week 2 CS 361: Advanced Data Structures and Algorithms
1 Growth of Functions CS 202 Epp, section ??? Aaron Bloomfield.
Lecture 2 Computational Complexity
Algorithm Efficiency CS 110: Data Structures and Algorithms First Semester,
DISCRETE MATHEMATICS I CHAPTER 11 Dr. Adam Anthony Spring 2011 Some material adapted from lecture notes provided by Dr. Chungsim Han and Dr. Sam Lomonaco.
Analysis of Algorithms
CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.
Analysis of Algorithms These slides are a modified version of the slides used by Prof. Eltabakh in his offering of CS2223 in D term 2013.
Sept COMP60611 Fundamentals of Concurrency Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley,
Complexity of Algorithms
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Program Efficiency & Complexity Analysis. Algorithm Review An algorithm is a definite procedure for solving a problem in finite number of steps Algorithm.
Fundamentals of Algorithms MCS - 2 Lecture # 8. Growth of Functions.
Chapter 10 Algorithm Analysis.  Introduction  Generalizing Running Time  Doing a Timing Analysis  Big-Oh Notation  Analyzing Some Simple Programs.
October COMP60611 Fundamentals of Parallel and Distributed Systems Lecture 4 An Approach to Performance Modelling Len Freeman, Graham Riley Centre.
Parallel Programming with MPI and OpenMP
CSC310 © Tom Briggs Shippensburg University Fundamentals of the Analysis of Algorithm Efficiency Chapter 2.
Big Oh Notation Greek letter Omicron (Ο) is used to denote the limit of asymptotic growth of an algorithm If algorithm processing time grows linearly with.
Copyright © 2014 Curt Hill Growth of Functions Analysis of Algorithms and its Notation.
CSC – 332 Data Structures Generics Analysis of Algorithms Dr. Curry Guinn.
Time Complexity of Algorithms (Asymptotic Notations)
Data Structures Using C++ 2E
CS 206 Introduction to Computer Science II 09 / 18 / 2009 Instructor: Michael Eckmann.
Asymptotic Notations By Er. Devdutt Baresary. Introduction In mathematics, computer science, and related fields, big O notation describes the limiting.
C++ How to Program, 7/e © by Pearson Education, Inc. All Rights Reserved.
Searching Topics Sequential Search Binary Search.
Sorting Algorithms Written by J.J. Shepherd. Sorting Review For each one of these sorting problems we are assuming ascending order so smallest to largest.
Algorithms Lecture #05 Uzair Ishtiaq. Asymptotic Notation.
Concurrency and Performance Based on slides by Henri Casanova.
Lecture 7. Asymptotic definitions for the analysis of algorithms 1.
Asymptotic Bounds The Differences Between (Big-O, Omega and Theta) Properties.
Dynamic Load Balancing Tree and Structured Computations.
Complexity of Algorithms Fundamental Data Structures and Algorithms Ananda Guna January 13, 2005.
Ch03-Algorithms 1. Algorithms What is an algorithm? An algorithm is a finite set of precise instructions for performing a computation or for solving a.
Chapter 3 Chapter Summary  Algorithms o Example Algorithms searching for an element in a list sorting a list so its elements are in some prescribed.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
CSE 3358 NOTE SET 2 Data Structures and Algorithms 1.
University of British Columbia
Introduction to the Design and Analysis of Algorithms
Big-O notation.
The Growth of Functions
Computation.
Objective of This Course
COMP60611 Fundamentals of Parallel and Distributed Systems
COMP60621 Designing for Parallelism
COMP60621 Fundamentals of Parallel and Distributed Systems
Asst. Dr.Surasak Mungsing
What LIMIT Means Given a function: f(x) = 3x – 5 Describe its parts.
COMP60611 Fundamentals of Parallel and Distributed Systems
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
COMP60611 Fundamentals of Parallel and Distributed Systems
CS 2604 Data Structures and File Management
Algorithm Course Dr. Aref Rashad
Presentation transcript:

Combining the strengths of UMIST and The Victoria University of Manchester COMP60611 Fundamentals of Parallel and Distributed Systems Lecture 7 Scalability Analysis John Gurd, Graham Riley Centre for Novel Computing School of Computer Science University of Manchester

October Scalability What do we mean by scalability? –Scalability applies to an algorithm executing on a parallel computer, not simply to an algorithm! How does an algorithm behave for a fixed problem size as the number of processors used increases? –This is known as strong scaling. How does an algorithm behave as the problem size changes, in addition to changing the number of processors? A key insight is to look at how efficiency changes.

October Efficiency and Strong Scaling Typically, for a fixed problem size, N, the efficiency of an algorithm decreases as P increases. (Why?) –Overheads typically do not get smaller as P increases. They remain ‘fixed’ or, worse, they may grow with P (e.g. the number of communications may grow – in an all-to-all communication pattern). Recall that:

October Efficiency and Strong Scaling PO P is the total overhead in the system. T ref represents the true useful work in the algorithm. Because it tends to decrease with fixed N, at some point (absolute) efficiency E abs (i.e. how well each processor is being utilised) will drop below some acceptable threshold – say, 50%(?)

October Scalability No ‘real’ algorithm scales for all possible numbers of processors solving a fixed problem size on a ‘real’ computer. Even ‘embarrassingly’ parallel algorithms will have a limit on the number of processors they can use. –For example, at the point where, with a fixed N, eventually there is only one ‘element’ of some large data structure to be operated on by each processor. So we seek another approach to scalability which applies as both problem size N and the number of processors P change.

October Isoscaling and Isoefficiency A system is said to isoscale if, for a given algorithm and parallel computer, a specific level of efficiency can be maintained by changing the problem size, N, appropriately as P increases. Not all systems isoscale! –e.g. a binary tree-based vector reduction where N = P (see later). This approach is called scaled problem analysis. The function (of P ) describing how the problem size N must change as P increases in order to maintain a specified efficiency is known as the isoefficiency function. Isoscaling does not apply to all problems. –e.g. weather modelling, where increasing problem size (resolution) is eventually not an option –or image processing with a fixed number of pixels

October Weak Scaling An alternative approach is to keep the problem size per processor fixed as P increases (total problem size N thus increases linearly with P) and see how the efficiency is affected –This is known as weak scaling. Summary: strong scaling, weak scaling and isoscaling are three different approaches to understanding the scalability of parallel systems (algorithm + machine). We will look at an example shortly, but first we need a means of comparing the behaviour of functions, e.g. performance functions and efficiency functions, over their entire domains. These concepts will be explored further in lab exercise 2.

October Comparison Functions: Asymptotic Analysis Performance models are generally functions of problem size ( N ) and the number of processors ( P ). We need relatively easy ways to compare models (functions) as N and P vary: –Model A is ‘at most’ as fast or as big as model B; –Model A is ‘at least’ as fast or as big as model B; –Model A is ‘equal’ in performance/size to model B. We will see a similar need when comparing efficiencies and in considering scalability. These are all examples of comparison functions. We are often interested in asymptotic behaviour, i.e. the behaviour as some key parameter (e.g. N or P) increases towards infinity.

October Comparison Functions – Example From ‘Introduction to Parallel Computing’, Grama. Consider the three functions below: –Think of these functions as modelling the distance travelled by three cars from time t=0. One car has fixed speed and the others are accelerating – car C makes a standing start (zero initial speed).

October Graphically

October We can see that: –For t > 45, B(t) is always greater than A(t). –For t > 20, C(t) is always greater than B(t). –For t > 0, C(t) is always less than 1.25*B(t).

October Introducing ‘big-Oh’ Notation It is often useful to express a bound on the growth of a particular function in terms of a simpler function. For example, for t > 45, B(t) is always greater than A(t), we can express the relation between A(t) and B(t) using the Ο (Omicron or ‘big-Oh’) notation: This means that A(t) is “at most” B(t) beyond some value of t. Formally, given functions f(x), g(x), f(x) = O(g(x)) if there exist positive constants c and x 0 such that f(x) ≤ cg(x) for all x ≥ x 0 [Definition from JaJa not Grama! – more transparent].

October From this definition, we can see that: –A(t) = O(t) (“at most” or “of the order t”), –B(t) = O(t 2 ) (“at most” or “of the order t 2 ”), –Finally, C(t) = O(t 2 ), too. Informally, big-Oh can be used to identify the simplest function that bounds (above) a more complex function, as the parameter gets (asymptotically) bigger.

October Theta and Omega There are two other useful symbols: –Omega (Ω) meaning “at least”: –Theta ( Θ ) “equals” or “goes as”: For formal definitions, see, for example, ‘An Introduction to Parallel Algorithms’ by JaJa or ‘Highly Parallel Computing’ by Almasi and Gottlieb. Note that the definitions in Grama et al. are a little misleading!

October Performance Modelling – Example The following slides develop performance models for the example of a vector sum reduction. The models are then used to support basic scalability analysis of the resulting parallel systems. Consider two parallel systems: –First, a binary tree-based vector sum when the number of elements (N) is equal to the number of processors (P), N=P. –Second, a version for which N >> P. Develop performance models. –Compare the models. –Consider the resulting system scalability.

October Vector Sum Reduction (N = P) Assume that: –N = P, and –N is a power of 2. Propagate intermediate values through a binary tree of ‘adder’ nodes (processors): –Takes log 2 N steps with N processors (one of the processors is busy at every step, waiting for a message then doing an addition, the other processors have some idle time). Each step thus requires time for communication of a single word (cost t s +t w ) and a single addition (cost t c ):

October Vector Sum Speedup (N = P) Speedup: Speedup is poor, but monotonically increasing. –If N=128, S abs is ~18 (E abs = S abs /P = ~0.14, i.e. 14%), –If N=1024, S abs is ~100 (E abs = ~0.1, i.e. 10%), –If N=1M, S abs is ~ 52,000 (E abs = ~0.05, i.e. 5%), –If N=1G, S abs is ~ 35M (E abs = ~ 0.035, i.e. 3.5%).

October Vector Sum Scalability (N = P) Efficiency: But, N = P in this case, so: Strong scaling not ‘good’, as we have seen (E abs << 0.5). Efficiency is monotonically decreasing –Reaches 50% point, E abs = 0.5, when log 2 P = 2, i.e. when P = 4. This does not isoscale, either! –E abs gets smaller as P (hence N) increases and P and N must change together.

October When N >> P When N >> P, each processor can be allocated N/P elements (for simplicity, assume N is exactly divisible by P). Each processor sums its local elements in a first phase. A binary tree sum of size P is then performed to sum the P partial results. The performance model is:

October Strong Scalability (N >> P) Speedup: Strong scaling?? For a given problem size N (>> P), the (log 2 P/N) term is always ‘small’ so speedup will fall off ‘slowly’. P is, of course, limited by the value of N, but we are considering the case where N >> P.

October Isoscalability (N >> P) Efficiency: Now, we can always achieve a required efficiency on P processors by a suitable choice of N.

October Isoscalability (N >> P) For example, for 50% E abs, isoefficiency function is: Or, for E abs > 50%, isoefficiency function is: –As N gets larger for a given P, E abs gets closer to 1! –The ‘good’ parallel phase (N/P work) thus dominates the log 2 P phase as N gets larger, leading to relatively good (iso)scalability.

October Summary of Performance Modelling Performance modelling provides insight into the behaviour of parallel systems (parallel algorithms on parallel machines). Performance modelling allows the comparison of algorithms and gives insight into their potential scalability. Two main forms of scalability: –Strong scaling (fixed problem size N as P varies) There is always a limit to strong scaling for real parallel systems (i.e. a value of P at which efficiency falls below an acceptable limit). –Isoscaling (the ability to maintain a specified level of efficiency by changing N as P varies). Not all parallel systems isoscale. Asymptotic (‘big-Oh’) analysis makes comparison easier, but BEWARE the constants! Weak scaling is related to isoscaling – aim to maintain a fixed problem size per processor as P changes and look at the effect on efficiency.