Chapter 3: Principles of Scalable Performance

Slides:



Advertisements
Similar presentations
CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Advertisements

Parallelism Lecture notes from MKP and S. Yalamanchili.
Performance Analysis of Multiprocessor Architectures
Computer Performance CS350 Term Project-Spring 2001 Elizabeth Cramer Bryan Driskell Yassaman Shayesteh.
1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.
Parallel Processing & Distributed Systems Thoai Nam Chapter 2.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
EECC756 - Shaaban #1 lec # 9 Spring Parallel System Performance: Evaluation & Scalability Factors affecting parallel system performance:
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.
EECC756 - Shaaban #1 lec # 9 Spring Parallel System Performance: Evaluation & Scalability Factors affecting parallel system performance: –Algorithm-related,
EECC756 - Shaaban #1 lec # 10 Spring Parallel System Performance: Evaluation & Scalability Factors affecting parallel system performance:
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
1 Lecture 11: Digital Design Today’s topics:  Evaluating a system  Intro to boolean functions.
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
EECC756 - Shaaban #1 lec # 12 Spring Parallel System Performance: Evaluation & Scalability Factors affecting parallel system performance:
EECC756 - Shaaban #1 lec # 11 Spring Parallel System Performance: Evaluation & Scalability Factors affecting parallel system performance:
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 3, 2003 Lecture 2.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
CMPE655 - Shaaban #1 lec # 9 Fall Parallel System Performance: Evaluation & Scalability Factors affecting parallel system performance:
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1 Computer Performance: Metrics, Measurement, & Evaluation.
CS 420 Design of Algorithms Analytical Models of Parallel Algorithms.
Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
Performance Evaluation of Parallel Processing. Why Performance?
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.
Memory/Storage Architecture Lab Computer Architecture Performance.
“elbowing out” Processors used Speedup Efficiency timeexecution Parallel Processors timeexecution Sequential Efficiency   
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.
Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Lecture 9: 9/24/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Parallel Programming with MPI and OpenMP
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Performance Performance
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
1 A simple parallel algorithm Adding n numbers in parallel.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Algorithm Complexity is concerned about how fast or slow particular algorithm performs.
DCS/1 CENG Distributed Computing Systems Measures of Performance.
Performance. Moore's Law Moore's Law Related Curves.
Measuring Performance II and Logic Design
PRAM and Parallel Computing
CS203 – Advanced Computer Architecture
OPERATING SYSTEMS CS 3502 Fall 2017
Lecture 2: Performance Evaluation
Parallel System Performance: Evaluation & Scalability
4- Performance Analysis of Parallel Programs
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
CSE8380 Parallel and Distributed Processing Presentation
Distributed Systems CS
Performance Cycle time of a computer CPU speed speed = 1 / cycle time
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Parameters that affect it How to improve it and by how much
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Presentation transcript:

Chapter 3: Principles of Scalable Performance Performance measures Speedup laws Scalability principles Scaling up vs. scaling down EENG-630 Chapter 3

Performance metrics and measures Parallelism profiles Asymptotic speedup factor System efficiency, utilization and quality Standard performance measures EENG-630 Chapter 3

Degree of parallelism Reflects the matching of software and hardware parallelism Discrete time function – measure, for each time period, the # of processors used Parallelism profile is a plot of the DOP as a function of time Ideally have unlimited resources EENG-630 Chapter 3

Factors affecting parallelism profiles Algorithm structure Program optimization Resource utilization Run-time conditions Realistically limited by # of available processors, memory, and other nonprocessor resources EENG-630 Chapter 3

Average parallelism variables n – homogeneous processors m – maximum parallelism in a profile  - computing capacity of a single processor (execution rate only, no overhead) DOP=i – # processors busy during an observation period EENG-630 Chapter 3

Average parallelism Total amount of work performed is proportional to the area under the profile curve EENG-630 Chapter 3

Average parallelism EENG-630 Chapter 3

Example: parallelism profile and average parallelism EENG-630 Chapter 3

Asymptotic speedup = A in the ideal case (response time) EENG-630 Chapter 3

Performance measures Consider n processors executing m programs in various modes Want to define the mean performance of these multimode computers: Arithmetic mean performance Geometric mean performance Harmonic mean performance EENG-630 Chapter 3

Arithmetic mean performance Arithmetic mean execution rate (assumes equal weighting) Weighted arithmetic mean execution rate proportional to the sum of the inverses of execution times EENG-630 Chapter 3

Geometric mean performance Geometric mean execution rate Weighted geometric mean execution rate does not summarize the real performance since it does not have the inverse relation with the total time EENG-630 Chapter 3

Harmonic mean performance Mean execution time per instruction For program i Arithmetic mean execution time per instruction EENG-630 Chapter 3

Harmonic mean performance Harmonic mean execution rate Weighted harmonic mean execution rate corresponds to total # of operations divided by the total time (closest to the real performance) EENG-630 Chapter 3

Harmonic Mean Speedup Ties the various modes of a program to the number of processors used Program is in mode i if i processors used Sequential execution time T1 = 1/R1 = 1 EENG-630 Chapter 3

Harmonic Mean Speedup Performance EENG-630 Chapter 3

Amdahl’s Law Assume Ri = i, w = (, 0, 0, …, 1- ) System is either sequential, with probability , or fully parallel with prob. 1-  Implies S  1/  as n   EENG-630 Chapter 3

Speedup Performance EENG-630 Chapter 3

System Efficiency O(n) is the total # of unit operations T(n) is execution time in unit time steps T(n) < O(n) and T(1) = O(1) EENG-630 Chapter 3

Redundancy and Utilization Redundancy signifies the extent of matching software and hardware parallelism Utilization indicates the percentage of resources kept busy during execution EENG-630 Chapter 3

Quality of Parallelism Directly proportional to the speedup and efficiency and inversely related to the redundancy Upper-bounded by the speedup S(n) EENG-630 Chapter 3

Example of Performance Given O(1) = T(1) = n3, O(n) = n3 + n2log n, and T(n) = 4 n3/(n+3) S(n) = (n+3)/4 E(n) = (n+3)/(4n) R(n) = (n + log n)/n U(n) = (n+3)(n + log n)/(4n2) Q(n) = (n+3)2 / (16(n + log n)) EENG-630 Chapter 3

Standard Performance Measures MIPS and Mflops Depends on instruction set and program used Dhrystone results Measure of integer performance Whestone results Measure of floating-point performance TPS and KLIPS ratings Transaction performance and reasoning power EENG-630 Chapter 3

Parallel Processing Applications Drug design High-speed civil transport Ocean modeling Ozone depletion research Air pollution Digital anatomy EENG-630 Chapter 3

Application Models for Parallel Computers Fixed-load model Constant workload Fixed-time model Demands constant program execution time Fixed-memory model Limited by the memory bound EENG-630 Chapter 3

Algorithm Characteristics Deterministic vs. nondeterministic Computational granularity Parallelism profile Communication patterns and synchronization requirements Uniformity of operations Memory requirement and data structures EENG-630 Chapter 3

Isoefficiency Concept Relates workload to machine size n needed to maintain a fixed efficiency The smaller the power of n, the more scalable the system workload overhead EENG-630 Chapter 3

Isoefficiency Function To maintain a constant E, w(s) should grow in proportion to h(s,n) C = E/(1-E) is constant for fixed E EENG-630 Chapter 3

Speedup Performance Laws Amdahl’s law for fixed workload or fixed problem size Gustafson’s law for scaled problems (problem size increases with increased machine size) Speedup model for scaled problems bounded by memory capacity EENG-630 Chapter 3

Amdahl’s Law As # of processors increase, the fixed load is distributed to more processors Minimal turnaround time is primary goal Speedup factor is upper-bounded by a sequential bottleneck Two cases: DOP < n DOP  n EENG-630 Chapter 3

Fixed Load Speedup Factor Case 1: DOP > n Case 2: DOP < n EENG-630 Chapter 3

Gustafson’s Law With Amdahl’s Law, the workload cannot scale to match the available computing power as n increases Gustafson’s Law fixes the time, allowing the problem size to increase with higher n Not saving time, but increasing accuracy EENG-630 Chapter 3

Fixed-time Speedup As the machine size increases, have increased workload and new profile In general, Wi’ > Wi for 2  i  m’ and W1’ = W1 Assume T(1) = T’(n) EENG-630 Chapter 3

Gustafson’s Scaled Speedup EENG-630 Chapter 3

Memory Bounded Speedup Model Idea is to solve largest problem, limited by memory space Results in a scaled workload and higher accuracy Each node can handle only a small subproblem for distributed memory Using a large # of nodes collectively increases the memory capacity proportionally EENG-630 Chapter 3

Fixed-Memory Speedup Let M be the memory requirement and W the computational workload: W = g(M) g*(nM)=G(n)g(M)=G(n)Wn EENG-630 Chapter 3

Relating Speedup Models G(n) reflects the increase in workload as memory increases n times G(n) = 1 : Fixed problem size (Amdahl) G(n) = n : Workload increases n times when memory increased n times (Gustafson) G(n) > n : workload increases faster than memory than the memory requirement EENG-630 Chapter 3

Scalability Metrics Machine size (n) : # of processors Clock rate (f) : determines basic m/c cycle Problem size (s) : amount of computational workload. Directly proportional to T(s,1). CPU time (T(s,n)) : actual CPU time for execution I/O demand (d) : demand in moving the program, data, and results for a given run EENG-630 Chapter 3

Scalability Metrics Memory capacity (m) : max # of memory words demanded Communication overhead (h(s,n)) : amount of time for interprocessor communication, synchronization, etc. Computer cost (c) : total cost of h/w and s/w resources required Programming overhead (p) : development overhead associated with an application program EENG-630 Chapter 3

Speedup and Efficiency The problem size is the independent parameter EENG-630 Chapter 3

Scalable Systems Ideally, if E(s,n)=1 for all algorithms and any s and n, system is scalable Practically, consider the scalability of a m/c EENG-630 Chapter 3