1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.

Slides:



Advertisements
Similar presentations
CS1104: Computer Organisation School of Computing National University of Singapore.
Advertisements

Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
1  1998 Morgan Kaufmann Publishers Chapter 2 Performance Text in blue is by N. Guydosh Updated 1/25/04*
Evaluating Performance
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
1 Introduction Rapidly changing field: –vacuum tube -> transistor -> IC -> VLSI (see section 1.4) –doubling every 1.5 years: memory capacity processor.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
EECE476: Computer Architecture Lecture 11: Understanding and Assessing Performance Chapter 4.1, 4.2 The University of British ColumbiaEECE 476© 2005 Guy.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 ECE3055 Computer Architecture and Operating Systems Lecture 2 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia.
1 Measuring Performance Chris Clack B261 Systems Architecture.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.
Lecture 2: Computer Performance
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman, Sep 04 Oren Kapah) IBM י ב מ 7-1 Measuring.
1 CPS4150 Chapter 4 Assessing and Understanding Performance.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Performance.
Computer Performance Computer Engineering Department.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
1  1998 Morgan Kaufmann Publishers Lectures for 2nd Edition Note: these lectures are often supplemented with other materials and also problems from the.
Lecture 5: 9/10/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
COD Ch. 1 Introduction + The Role of Performance.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
CPEN Digital System Design Assessing and Understanding CPU Performance © Logic and Computer Design Fundamentals, 4 rd Ed., Mano Prentice Hall © Computer.
Computer Organization
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Defining Performance Which airplane has the best performance?
Prof. Hsien-Hsin Sean Lee
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
Computer Performance He said, to speed things up we need to squeeze the clock.
Performance of computer systems
Performance of computer systems
Parameters that affect it How to improve it and by how much
Computer Performance Read Chapter 4
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

1 CHAPTER 2 THE ROLE OF PERFORMANCE

2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for different programs? What factors of system performance are hardware related? (e.g., Do we need a new machine, or a new operating system?) How does the machine's instruction set affect performance?

3 Objectives: Performance and Benchmarks What do we mean by the performance of a computer and why are we concerned with it? What's the best way to compare the performance of two machines? What are benchmarks? How useful are they? Performance can be used to: –Guide design decisions –Compare architectures/implementations/compilers –However, performance is in the eye of the beholder! Response/Execution time - time between start and completion of a task Throughput - total amount of work done in a given time (number of job processes per unit time)

4 Computer Performance: TIME, TIME, TIME Response Time (latency) — How long does it take for my job to run? — How long does it take to execute a job? — How long must I wait for the database query? Throughput — How many jobs can the machine run at once? — What is the average execution rate? — How much work is getting done? If we upgrade a machine with a new processor what do we increase? If we add a new machine to the lab what do we increase?

5 Measuring Performance Factors that affect performance: –How well the program uses the instructions of the machine –How well the underlying hardware implements the instructions –How well the memory and I/O systems perform –We will compare performance of different machines on the same task Performance of machine X for a given program is defined as: Performance (X) = 1 / Execution Time(X) If performance of X is better than Y: Execution Time (Y) > Execution Time (X) Performance (X) > Performance (Y) because: 1 / Execution Time(X) > 1 / Execution Time(Y) Speedup of architecture X over Y Performance(X) / Performance(Y) = Execution Time(Y) / Execution Time(X) = n meaning: X is n times faster than Y

6 Examples Example 1: Machine A does a task in 20s, machine B does the same task in 25s. a)What is the performance of each machine? (PA = 1/20,PB = 1/25) b)How much faster is A than B? (what is the speedup?) (5/4) c)Is "performance" a meaningful metric? (NO: depends on task) Example 2: Machine A executes a program in 10s. a)If machine B is 1.3x faster than A, what is the execution time on machine B? (1.3 = PB/PA = TA/TB: TB = 10/1.3) b)If machine C is 1.5x slower than A, what is the execution time on machine C? (1.5 = PA/PC = TC/TA: TC = 15) But how do we measure time?

7 Measuring Computer Time Unix time command output on a program provides: –Real time – time from invocation to termination –User CPU time - time CPU executes within this task –System CPU time - O/S tasks performed on behalf of this task These measures (especially elapsed time) are what users perceive. Is this response time or throughput? How do you measure portions of a program? How do you measure time on Windows?

8 Clock cycles, Clock Rate and Execution Time Computers are constructed using a clock that runs at a constant rate and determines when events take place in hardware. These discrete time intervals are called: clock cycles/ticks /clock periods/cycles. The length of a clock period is the time for a complete clock cycle (e.g., 2 nanoseconds, 2 ns). Clock rate is the number of cycles per second, often expressed in megahertz (MHz). Clock rate is the inverse of clock period: 1/cycle time. What is the clock rate for a 2 ns cycle? 1/(2×10 -9 ) = 500×10 6 = 500 MHz What is the clock period for a machine with a clock rate of 800 MHz? What is the clock period for a machine with a clock rate of 400 MHz? (Answer: 1/(800×10 6 ) = 1.25×10 -9 sec; 1/(400×10 6 ) = 2.5×10 -9 sec) Relationship: faster clock rate, lower clock period.

9 Clock cycles, Clock Rate and Execution Time Instead of reporting execution time in seconds, we often use cycles Clock “ ticks ” indicate when to start activities (one abstraction): cycle time (clock period) = time between ticks = seconds per cycle clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec) A 200 MHz clock ticks A 200 MHz. clock has cycle time: time

10 Clock cycles, Clock Rate and Execution Time How do we calculate execution time? Factors: –How many cycles to do all the work? –How long each cycle takes (Clock Period)? Calculation of Time using Clock Period (cycle period, cycle length) CPU Exec Time = # clock cycles × clock period [Units] seconds = cycle × seconds/cycle Example: Assume a program requires 200 × 10 6 cycles on a machine where each cycle takes 2 ns. What is the execution time? (200 × 10 6 × 2 × = 0.4 sec) Calculation of Time using Clock Rate (cycle frequency, clock frequency) Clock period = 1/Clock Rate Therefore: Execution Time = # clock cycles/clock rate [Units] seconds = cycles / (cycles/second) Example: Assume a program requires 200 × 10 6 cycles on a machine with clock rate of 500 MHz. What is the execution time? (200 × 10 6 /(500 × 10 6 ) = 0.4 sec)

11 Examples Example 1: Machine A runs at 500 MHz. Machine B runs at 650 MHz. Program1 requires 100 x 10 6 clock cycles on machine A and 1.2 times that many on machine B. Which machine is faster? By how much? Exec(A) = 100 × 10 6 / (500 × 10 6) =.2 seconds OR 100 × 10 6 × 2 × = 200 × =.2 s Exec(B) = 120 × 10 6 / (650 × 10 6 ) =.18 seconds Machine B is.2/.18 = 1.11 times faster than A Compare: 650/500 = 1.3 times clock rate Example 2: If a program takes 10 seconds on a 500 MHz machine. a)How many cycles must it require? Cycles = 10 seconds × 500 × 10 6 cycles/second = 5000 × 10 6 cycles b)What clock rate would be needed to achieve a 1.2 times speedup? (assuming clock cycles can stay the same) Target Execution: 10/1.2 = 8.3 sec 5000 × 10 6 / 8.33 = 602 MHz

12 How many cycles are required for a program? Could assume that # of cycles = # of instructions This assumption is incorrect: Different instructions take different amounts of time on different machines. Why? hint: remember that these are machine instructions, not lines of C code time 1st instruction2nd instruction3rd instruction4th 5th6th...

13 Different numbers of cycles for different instructions Multiplication takes more time than addition Floating point operations take longer than integer ones Accessing memory takes more time than accessing registers Important point: changing the cycle time often changes the number of cycles required for various instructions (more later) time

14 Cycles per Instruction, (CPI) The number of Cycles per Instruction, CPI helps software designers avoid Instructions with a high CPI in favor of those with a low CPI. Program CPI = Average number of clock cycles per instruction. CPI depends on hardware implementation and instruction mix. We may calculate based on instruction counts OR based on relative instruction frequencies. Example 1: Assume 3 types of instructions: –Arithmetic (=,+,-,*,/) takes 4 cycles –Conditional (if) takes 3 cycles –I/O takes 5 cycles Consider the following code segment: cin >> num1; cin >> num2; num3 = num1 + num2; if (num3 > 10) cout << "yes"; else cout << "no"; a) How many cycles to complete? ( =26 cycles) b) What's the average number of cycles per instruction?(26/5 = 5.2 cycles)

15 Program Cycles per Instruction, (CPI) CPI Calculation with Instruction Count: Assume CPI = CPU Clock Cycles/Instruction Count then overall program CPU Clock Cycles = Σ (CPI i × Count i ) so that CPI = Overall Program Cycles/#Instructions Example 2: Assume Class A CPI=1, Class B CPI=2, Class C CPI=3 Program requires 5 A, 3 B, 2 C instructions. What is the CPI? # CPU Cycles = 5 × × × 3 = 17 # Instructions = = 10 Therefore CPI = 17 cycles/10 instructions = 1.7 cycles/instruction CPI Calculation with Relative Frequencies: Let f i be the relative frequency of instruction set i with CPI i cycles per instruction. Then Program CPI = Σ (CPI i × f i ) Example 3: Assume Class A CPI=1, Class B CPI=2, Class C CPI=3 and Program uses 50% A, 30% B, 20% C instructions. What is the CPI? CPI =.5 × × × 3 = 1.7

16 Program Cycles per Instruction, (CPI) Why is: CPI = Σ (CPI i × f i ) true? CPI = CPU Clock Cycles/Instr. Count = Σ (CPI i × Count i )/Instr. Count = Σ (CPI i × Count i /Instr. Count) = Σ (CPI i × f i ). Execution Time Execution Time = #Cycles × cycle time = (CPI × Instr. Count) × cycle time = Instruction Count × CPI × cycle time = (Instruction Count × CPI)/Clock Rate Example 1: How long would it take to execute a program with 100 × 10 6 instructions if CPI is 3 and clock rate is 500 MHz? (Answer: Time = 100 × 10 6 × 3/(500 × 10 6 ) = 3/5 = 0.6 sec)

17 Improving Computer Performance Time = Instruction Count × CPI × cycle time Time = (Instructions / Program)×(# Cycles / Instruction)×(Seconds / Cycle) For a given instruction set architecture, increases in CPU performance come from three sources: –Increases in clock rate –Improvements in processor organization that lower the CPI –Compiler enhancements that lower instruction count or generate lower average CPI Which source was used to improve performance by: –Using Intel Pentium III 933 MHz instead of Intel Pentium III 800 MHz. –Using Intel Pentium IV instead of Intel Pentium III. –Using release versions instead of debug versions of programs. Very important: When comparing two machines, you must consider all three components of execution time. If some factors are identical, then comparison can be based on just non-identical factors.

18 Improving Computer Performance: RISC vs. CISC Time = (Instructions / Program)×(# Cycles / Instruction)×(Seconds / Cycle) Computer Architectures can be categorized as RISC or CISC (Reduced Instruction Set Computer vs. Complex Instruction Set Computer). The CISC approach attempts to minimize the number of instructions per program, sacrificing the number of cycles per instruction. –Emphasizes improving hardware –Includes multi-clock complex instructions RISC does the opposite, reducing the cycles per instruction at the cost of the number of instructions per program. –Emphasis on software –Includes single-clock reduced instruction only Modern architectures emphasizes RISC

19 Improving Computer Performance Example 2: Machine 1 and Machine 2 both have clock speeds of 500 MHz On Machine 1, program P requires 100 × 10 6 instructions & has a CPI of 2.5 On Machine 2, program P requires 90 × 10 6 instructions & has a CPI of 3 Which machine is faster? By how much? (T1 = 0.5 sec, T2 = 0.54 sec, Machine 1 is 1.08 times faster) Evaluating Computer Performance: A company that uses the same set of programs day in, day out uses the same programs (workload) to compare systems (e.g. old vs. new) What if a company does not fall in these categories? Use some kind of rating.

20 Evaluating Computer Performance: Goal: simple metric where higher rating means better performance. Some ratings are: Native MIPS Peak MIPS Relative MIPS MOPS, MFLOPS For all these measures, there is a tendency to generalize, which is not valid. Benchmarks: Programs specifically chosen to measure performance. Organization in charge of Benchmarks is: System Performance Evaluation Cooperative (SPEC). The rating is the SPEC ratio with respect to some standard machine. The higher the SPEC ratio, the better the machine.

21 SPEC ’ 89 for IBM Powerstation 550 Compiler “ enhancements ” and performance

22 Summary Performance of a computer can be measured by: Response/Execution time - time between start and completion of a task and Throughput - total amount of work done in a given time. Factors determining execution time are: Number of cycles to do all the work and how long each cycle takes (Clock Period). CPI helps software designers avoid Instructions with a high CPI in favor of those with a low CPI where possible. Program CPI can be obtained from Instruction Count or from the instruction relative frequencies. Improving Performance means decreasing Time = Instruction Count × CPI × cycle time = (Instr. / Program)×(# Cycles / Inst.)×(Seconds / Cycle) by –Increases in clock rate –Improvements in processor organization that lower the CPI –Compiler enhancements that lower instruction count or generate lower average CPI Ratings of Computer Performances are: MIPS, MOPS, MFLPOS and by using Benchmarks.

23 Performance Formulas