4- Performance Analysis of Parallel Programs

Slides:



Advertisements
Similar presentations
CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Advertisements

11 Measuring performance Kosarev Nikolay MIPT Feb, 2010.
Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
ECE 4100/6100 Advanced Computer Architecture Lecture 3 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute.
Performance Analysis of Multiprocessor Architectures
1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.
Chapter 4 M. Keshtgary Spring 91 Type of Workloads.
2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary
Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.5 Comparing and Summarizing Performance.
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 3, 2003 Lecture 2.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
Performance Evaluation of Parallel Processing. Why Performance?
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.
1 Measuring and Discussing Computer System Performance or “My computer is faster than your computer” Reading: 2.4, Peer Instruction Lecture Materials.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
1. 2 Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
Cost and Performance.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Performance Performance
6.1 Advanced Operating Systems Lies, Damn Lies and Benchmarks Are your benchmark tests reliable?
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.
CPS 258 Announcements –Lecture calendar with slides –Pointers to related material.
CS203 – Advanced Computer Architecture Performance Evaluation.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
DCS/1 CENG Distributed Computing Systems Measures of Performance.
CpE 442 Introduction to Computer Architecture The Role of Performance
CS203 – Advanced Computer Architecture
CSCI206 - Computer Organization & Programming
Lecture 2: Performance Evaluation
PARALLEL COMPUTING Submitted By : P. Nagalakshmi
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
Performance Performance The CPU Performance Equation:
Uniprocessor Performance
Introduction to Parallelism.
CSCE 212 Chapter 4: Assessing and Understanding Performance
Chapter 3: Principles of Scalable Performance
CSCI206 - Computer Organization & Programming
Performance of computer systems
CS170 Computer Organization and Architecture I
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Performance of computer systems
Prerequisite Glossary
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
Presentation transcript:

4- Performance Analysis of Parallel Programs

Performance Evaluation of Computer Systems CPU time (response time metric): Depends on the program and compiler efficiency.

Performance Evaluation of Computer Systems 2. MIPS and MFLOPS (throughput metric): depends on the program and type of instructions, no differentiation between floating point add and divide!

Performance Evaluation of Computer Systems 3. Performance of Processors with a Memory Hierarchy: In case of multiple cache levels:

Performance Evaluation of Computer Systems 4. Benchmark Programs: Synthetic benchmarks: small artificial programs that represent a large class of real applications. Such as Whetstone, Dhrystone Kernel benchmarks: small but relevant parts of real applications, such as Livermore Loops, or toy programs like quicksort. Real application benchmarks: several entire programs which reflect a workload of a standard user, called benchmark suites. Such as SPEC benchmarks (System Performance Evaluation Cooperation), and EEMBC benchmarks (EDV Embedded Microprocessor Benchmark Consortium).

Performance Metrics for Parallel Programs parallel runtime Tp(n): the time between the start of the program and the end of the execution on all participating processors; execution of local computations of each participating processor; exchange of data between processors of a distributed address space; synchronization of the participating processors when accessing shared data structures; waiting times occurring because of an unequal load distribution; Parallelization overhead

Performance Metrics for Parallel Programs Cost of a parallel program Cp(n): total amount of work performed by all processors; A parallel program is called cost-optimal if Speedup: Efficiency:

Speedup limit Amdahl’s law: When a (constant) fraction f , 0 ≤ f ≤ 1, of a parallel program must be executed sequentially, if 20% of a program must be executed sequentially, then the attainable speedup is limited to 1/f = 5.

Scalability For a fixed problem size n a saturation of the speedup can be observed when the number p of processors is increased. Efficiency can be kept constant if both the number p of processors and the problem size n are increased. Larger problems can be solved in the same time as smaller problems if a sufficiently large number of processors is employed. Gustafson’s law: for the special case that the sequential program part has a constant execution time, independently of the problem size.