1 Lecture: Benchmarks, Pipelining Intro Topics: Performance equations wrap-up, Intro to pipelining.

Slides:



Advertisements
Similar presentations
1 Lecture 2: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM Video 1: Using AM.
Advertisements

CS1104: Computer Organisation School of Computing National University of Singapore.
Lecture: Pipelining Basics
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Evaluating Performance
Lecture 7: 9/17/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Computer Organization and Architecture 18 th March, 2008.
1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.5 – 1.10)  Power/Energy examples  Performance summaries  Measuring cost and dependability.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.5 Comparing and Summarizing Performance.
1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.6, 1.7, 1.9, A.1)  Quantitative principles of computer design  Measuring cost.
1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.6, 1.7, 1.9, A.1)  Performance summaries  Quantitative principles of computer.
1 Lecture 11: Digital Design Today’s topics:  Evaluating a system  Intro to boolean functions.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Chapter 4 Assessing and Understanding Performance
Lecture: Pipelining Basics
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
1 Lecture 2: Measuring Performance/Cost/Power Today’s topics: (Sections 1.6, 1.4, 1.7, 1.8)  Quantitative principles of computer design  Measuring cost.
Lecture 24: CPU Design Today’s topic –Multi-Cycle ALU –Introduction to Pipelining 1.
1 Lecture 2: Metrics to Evaluate Systems Topics: Power and technology trends wrap-up, benchmark suites, performance equation, summarizing performance with.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Computer Performance Computer Engineering Department.
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 5810/6810: Hennessy.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Computer Performance Computer Engineering Department.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
Lecture 16: Basic Pipelining
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
1 Lecture: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM  Video 1: Using AM.
1 Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with.
Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
1 Lecture 3: Pipelining Basics Today: chapter 1 wrap-up, basic pipelining implementation (Sections C.1 - C.4) Reminders:  Sign up for the class mailing.
Intro to Computer Org. Assessing Performance. What Is Performance? What do we mean when we talk about the “performance” of a CPU?
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Performance 9 ways to fool the public #1 – Reporting Results.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
CS203 – Advanced Computer Architecture
Lecture 2: Performance Today’s topics:
Lecture 2: Performance Evaluation
Lecture 3: MIPS Instruction Set
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Lecture 16: Basic Pipelining
Lecture: Pipelining Basics
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
Lecture: Pipelining Basics
Lecture 16: Basic Pipelining
Lecture 2: Performance Today’s topics: Technology wrap-up
Performance of computer systems
Performance of computer systems
Lecture 3: MIPS Instruction Set
Performance of computer systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Parameters that affect it How to improve it and by how much
Lecture: Pipelining Basics
CS2100 Computer Organisation
Presentation transcript:

1 Lecture: Benchmarks, Pipelining Intro Topics: Performance equations wrap-up, Intro to pipelining

2 Measuring Performance Two primary metrics: wall clock time (response time for a program) and throughput (jobs performed in unit time) To optimize throughput, must ensure that there is minimal waste of resources

3 Benchmark Suites Performance is measured with benchmark suites: a collection of programs that are likely relevant to the user  SPEC CPU 2006: cpu-oriented programs (for desktops)  SPECweb, TPC: throughput-oriented (for servers)  EEMBC: for embedded processors/workloads

4 Summarizing Performance Consider 25 programs from a benchmark set – how do we capture the behavior of all 25 programs with a single number? P1 P2 P3 Sys-A Sys-B Sys-C  Sum of execution times (AM)  Sum of weighted execution times (AM)  Geometric mean of execution times (GM)

5 Sum of Weighted Exec Times – Example We fixed a reference machine X and ran 4 programs A, B, C, D on it such that each program ran for 1 second The exact same workload (the four programs execute the same number of instructions that they did on machine X) is run on a new machine Y and the execution times for each program are 0.8, 1.1, 0.5, 2 With AM of normalized execution times, we can conclude that Y is 1.1 times slower than X – perhaps, not for all workloads, but definitely for one specific workload (where all programs run on the ref-machine for an equal #cycles)

6 GM Example Computer-A Computer-B Computer-C P1 1 sec 10 secs 20 secs P secs 100 secs 20 secs Conclusion with GMs: (i) A=B (ii) C is ~1.6 times faster For (i) to be true, P1 must occur 100 times for every occurrence of P2 With the above assumption, (ii) is no longer true Hence, GM can lead to inconsistencies

7 Problem 4 Consider 3 programs from a benchmark set. Assume that system-A is the reference machine. How does the performance of system-C compare against that of system-B (for all 3 metrics)? P1 P2 P3 Sys-A Sys-B Sys-C  Sum of execution times (AM)  Sum of weighted execution times (AM)  Geometric mean of execution times (GM)

8 Problem 4 Consider 3 programs from a benchmark set. Assume that system-A is the reference machine. How does the performance of system-C compare against that of system-B (for all 3 metrics)? P1 P2 P3 S.E.T S.W.E.T GM Sys-A Sys-B Sys-C  Relative to B, C provides a speedup of 0.97 (S.W.E.T) or 0.99 (GM) or 1.07 (S.E.T)  Relative to B, C improves execution time by -3.3% (S.W.E.T) or -1% (GM) or 6.7% (S.E.T)

9 Summarizing Performance GM: does not require a reference machine, but does not predict performance very well  So we multiplied execution times and determined that sys-A is 1.2x faster…but on what workload? AM: does predict performance for a specific workload, but that workload was determined by executing programs on a reference machine  Every year or so, the reference machine will have to be updated

10 Speedup Vs. Percentage “Speedup” is a ratio = old exec time / new exec time “Improvement”, “Increase”, “Decrease” usually refer to percentage relative to the baseline = (new perf – old perf) / old perf A program ran in 100 seconds on my old laptop and in 70 seconds on my new laptop  What is the speedup? (1/70) / (1/100) = 1.42  What is the percentage increase in performance? ( 1/70 – 1/100 ) / (1/100) = 42%  What is the reduction in execution time? 30%

11 CPU Performance Equation Clock cycle time = 1 / clock speed CPU time = clock cycle time x cycles per instruction x number of instructions Influencing factors for each:  clock cycle time: technology and pipeline  CPI: architecture and instruction set design  instruction count: instruction set design and compiler CPI (cycles per instruction) or IPC (instructions per cycle) can not be accurately estimated analytically

12 Problem 5 My new laptop has an IPC that is 20% worse than my old laptop. It has a clock speed that is 30% higher than the old laptop. I’m running the same binaries on both machines. What speedup is my new laptop providing?

13 Problem 5 My new laptop has an IPC that is 20% worse than my old laptop. It has a clock speed that is 30% higher than the old laptop. I’m running the same binaries on both machines. What speedup is my new laptop providing? Exec time = cycle time * CPI * instrs Perf = clock speed * IPC / instrs Speedup = new perf / old perf = new clock speed * new IPC / old clock speed * old IPC = 1.3 * 0.8 = 1.04

14 An Alternative Perspective - I Each program is assumed to run for an equal number of cycles, so we’re fair to each program The number of instructions executed per cycle is a measure of how well a program is doing on a system The appropriate summary measure is sum of IPCs or AM of IPCs = 1.2 instr instr instr cyc cyc cyc This measure implicitly assumes that 1 instr in prog-A has the same importance as 1 instr in prog-B

15 An Alternative Perspective - II Each program is assumed to run for an equal number of instructions, so we’re fair to each program The number of cycles required per instruction is a measure of how well a program is doing on a system The appropriate summary measure is sum of CPIs or AM of CPIs = 0.8 cyc cyc cyc instr instr instr This measure implicitly assumes that 1 instr in prog-A has the same importance as 1 instr in prog-B

16 AM and HM Note that AM of IPCs = 1 / HM of CPIs and AM of CPIs = 1 / HM of IPCs So if the programs in a benchmark suite are weighted such that each runs for an equal number of cycles, then AM of IPCs or HM of CPIs are both appropriate measures If the programs in a benchmark suite are weighted such that each runs for an equal number of instructions, then AM of CPIs or HM of IPCs are both appropriate measures

17 AM vs. GM GM of IPCs = 1 / GM of CPIs AM of IPCs represents thruput for a workload where each program runs sequentially for 1 cycle each; but high-IPC programs contribute more to the AM GM of IPCs does not represent run-time for any real workload (what does it mean to multiply instructions?); but every program’s IPC contributes equally to the final measure

18 Problem 6 My new laptop has a clock speed that is 30% higher than the old laptop. I’m running the same binaries on both machines. Their IPCs are listed below. I run the binaries such that each binary gets an equal share of CPU time. What speedup is my new laptop providing? P1 P2 P3 Old-IPC New-IPC

19 Problem 6 My new laptop has a clock speed that is 30% higher than the old laptop. I’m running the same binaries on both machines. Their IPCs are listed below. I run the binaries such that each binary gets an equal share of CPU time. What speedup is my new laptop providing? P1 P2 P3 AM GM Old-IPC New-IPC AM of IPCs is the right measure. Could have also used GM. Speedup with AM would be 1.3.

20 Building a Car Start and finish a job before moving to the next Time Jobs Unpipelined

21 The Assembly Line A Time Jobs Pipelined BC ABC ABC ABC Break the job into smaller stages

22 Clocks and Latches Stage 1Stage 2

23 Clocks and Latches Stage 1Stage 2L Clk L

24 Title Bullet