Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.

Slides:



Advertisements
Similar presentations
CS1104: Computer Organisation School of Computing National University of Singapore.
Advertisements

CS2100 Computer Organisation Performance (AY2014/2015) Semester 2.
Computer Abstractions and Technology
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Assessing and Understanding Performance B. Ramamurthy Chapter 4.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Lecture 2: Computer Performance
Ch4b- 2 EE/CS/CPE Computer Organization  Seattle Pacific University Performance metrics I’m concerned with how long it takes to run my program.
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman, Sep 04 Oren Kapah) IBM י ב מ 7-1 Measuring.
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Performance.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Ch4a- 2 EE/CS/CPE Computer Organization  Seattle Pacific University Performance What differences do we see in performance? Almost all computers.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
Measuring Performance II and Logic Design
CSCI206 - Computer Organization & Programming
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Defining Performance Which airplane has the best performance?
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
Computer Performance He said, to speed things up we need to squeeze the clock.
CSCI206 - Computer Organization & Programming
CMSC 611: Advanced Computer Architecture
Performance Cycle time of a computer CPU speed speed = 1 / cycle time
Performance of computer systems
CMSC 611: Advanced Computer Architecture
Computer Performance Read Chapter 4
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations This is a matter of taste... Computers all operate at different speeds Speed is the most important performance metric 2.1 The entire point of computer hardware is to “perform” Operate correctly Implement useful operations Do so as fast as possible

Measuring speed Raw speed Ferrari wins 2.1 Which is faster? School Bus: 57 MPH, 40 people Ferrari: 170 MPH, 2 people Throughput Ferrari: 340 passenger- MPH School Bus: 2280 passenger-MPH Other issues... Range, reliability, cost

Peformance of computers How long does it take to run my favorite program? 2.2 To compare two computers, we compare the execution time of the same program on the two computers Faster one wins Lower execution time is better Batch throughput CPU time Response time

The CPU interprets machine-language instructions nd xecutes them A little background... The compiler converts this code into machine- language instructions 2.2 Computer programs are (usually) written in a high- level language (e.g. C) The performance of a program depends on: The number and types of instructions executed How fast the CPU can execute those instructions

Tick-tock Almost all modern computers are based on a clock Period 2.2 All events are controlled by and synchronized to a regular clock Clocks are just regular periodic waveforms Cycle time: time for the waveform to repeat itself Also known as the clock period Frequency: 1/Period Example: 10ns clock cycle --> period = s Frequency 1/10ns = 1/10 -8 s = 10 8 cycles/sec

Execution time Performance can be improved by: Decreasing the cycle time Hardware solution: Use faster technology Decreasing the number of cycles for the program Software: Write a better program Hardware: Re-design CPU 2.3 Time = cycles * cycle time Time = cycles / clock frequency Since the cycle time of a computer is constant, we can express time in terms of CPU cycles

Instruction execution time Every instruction takes time to execute Some instructions may take more or less time than others The time for an instruction is expressed in terms of clock cycles InstructionCycles ADD1 MULT4 CMP1 SUB2 Example: The time to run a program depends on: How many instructions What type of instructions 30 ADDs and 4 MULTs --> 46 cycles 2.3

Average CPI The Cycles-Per-Instruction (CPI) varies depending on what instructions are used Take an Average CPI Cycles = Number of Instructions * Average CPI 2.3 Average CPI should reflect the mix of instructions in the program A large proportion of 4-cycle MULTs should raise the CPI, a large proportion of 1-cycle ADDs should lower it The average should be the weighted average

Weighing the average InstructionCycles% ADD140 MULT410 CMP120 SUB230 Average CPI = 1 * 40% + 4 * 10% + 1 * 20% + 2 * 30% = = 1.6 Average CPI = 1 * 40% + 4 * 10% + 1 * 20% + 2 * 30% = = 1.6 Notice: The average CPI depends on the code we’re executing! Example mix of instructions 2.3

How long? Remember, lower is better Reducing any one of the three components reduces execution time 2.3 Execution time = Cycles * Cycle Time Cycles = Average CPI * Instruction Count Execution time = Instruction Count * CPI * Cycle Time Cycle time - Reduced through technology change, change in CPU design CPI - Reduced through better code, better compiler, change in CPU design Instruction count - Reduced through better code, better compiler, change in CPU design

Examples 2.3 System A: 10s to run a program. Clock period is 20ns. System B: Change clock to 10ns, no other changes. How long does it take to run the same program on System B? --> Time D = CPI D x Period D x Instructions D = 1.10 x 22ns x 4 x 10 8 = 9.68s --> Time A = CPI A x Period A x Instructions A = 10s System D: 400,000,000 instr., 22ns clock and a CPI of How long does it take to run the program on system D? --> Time B = CPI A x Period B x Instructions A = ? (Period B = Period A * 0.5) --> Time B = CPI A x Period A * 0.5 x Instructions A = Time A * 0.5 = 5s System C: 10s to run a program, 20ns clock, 400,000,000 instr. What is the CPI? --> CPI C = Time C / (Period C x Instr C ) = 10s / (20 x x 4 x 10 8 ) = 1.25

Examples 2.3 Assume an add takes 1 cycle, a mult 4 cycles, and a sub 2 cycles Two different compilers produce the following loops for the same code: add add mult sub add add mult add mult sub A:B: loop times What’s the CPI? CPI A = ( )/4 = 2.75 CPI B = ( )/6 = How long does it take to run each program on a 200MHz CPU? Time A = CPI A x Period A x Instructions A = 2.75 x 5ns x =.0055s Time B = CPI B x Period B x Instructions B = x 5ns x =.0050s

Performance metrics I’m concerned with how long it takes to run my program Chances are, that number isn’t published with the specs for the computer 2.4 Standardized metrics Benchmarks (SPEC, etc.) MIPS MFLOPS

Benchmarks Run a suite of benchmark programs, average the performance Benchmarks - programs thought to be representative of commonly-used programs 2.5 Advantages Actually corresponds to execution time! Represents a wider range of programs Disadvantages Are they running your program? Who picks the benchmarks? Be wary if the manufacturer does!

New tests use SPEC CPU2000 CINT Performance on integer programs CFP Performance on floating-point programs Larger numbers indicate better performance Tests prior to 2000 used CPU95 CPU 2000 only has only a few years of data SPEC Benchmarks SPEC (System Performance Evaluation Cooperative) maintains a set of benchmark suites 2.6 SPEC Web Page (

SPECint95 Results for Intel Processors Clock Speed (MHz) SPECint95 Note: Results depend on Cache size, memory system, and motherboard Better cache design (On-chip vs Off-chip)

SPECfp95 Results for Intel Processors Note: Results depend on Cache size, memory system, and motherboard Clock Speed (MHz) SPECfp

CINT2000 Results for Various Processors Clock Speed (GHz) CINT2000 Note: Results depend on Cache size, memory system, and motherboard Note: Athlon Part numbers are not the CPU MHz! Part numbers labeled on graph

CFP2000 Results for Various Processors Note: Results depend on Cache size, memory system, and motherboard Clock Speed (GHz) CFP2000 Note: Athlon Part numbers are not the CPU MHz! Part numbers labeled on graph

Limited benefits... Assume we’re running a program that spends 40% of its time accessing memory Now, we upgrade the processor from 200 MHz to 800 MHz How much faster does the program run? 2.7 We’ve reduced the time for 60% of the program by 4 But we haven’t touched the memory access time New total = Old * (40% + (60% / 4)) = Old * (40% + 15%) = Old * 55% Not even twice as fast!

Amdahl’s Law 2.7 Practical effect: “Make the common case fast” Corollary: “Forget about the rare case” New Execution time = Execution time affected by impr. + Unaffected Execution Time Amount of Improvement Example: 70% of my execution time is done on integer ADDs, and 6% on floating point ADDs. Total execution time is 100 seconds. What’s the effect of making integer ADDs twice as fast? New time = (100 *.70) / 2 + (100 *.30) = 35+30=65 seconds What’s the effect of making F.P. Adds twice as fast? New time = (100 *.06) / 2 + (100 *.94) = 3+94 = 97 seconds

(Native) MIPS 2.4 cycles second CPI = * cycles second CPI = * clock rate CPI = * Million Instructions Per Second Instructions second * MIPS = MIPS does not take into account how many instructions must be executed in a program 1. 1,000 instructions, CPI 1.2, 1.0 MHz clock Execution time = 1.2 ms, MIPS = 1/1.2 = instructions, CPI 2.0, 1.0 MHz clock Execution time = 1.0ms, MIPS = 1/2.0 =.500 Example: Same program, written two ways

Avoid MIPS (the metric, not the processor) Higher MIPS doesn’t always mean better performance Highest MIPS corresponds to using the smallest (fastest) instructions to lower CPI MIPS = clock rate / (CPI * 1,000,000) 2.4 Peak MIPS is pointless Peak MIPS is just what MIPS you get with smallest instructions Usually, CPI is 1.0 for this Just re-expressing clock rate in MHz

MFLOPS Million Floating-point Operations Per Second MFLOPS is similar to MIPS Measures floating-point operations (mult, divide, add,...) Suffers same problems as MIPS Different operations cost different amounts 2.4 Peak MFLOPS is especially bad

Performance Summary Execution time is the most important performance metric Basic formula for performance: Execution time = instructions * cycle time * CPI Amdahl’s law describes how making limited improvements affects the bottom line Only make improvements in areas that are commonly used Standard benchmarks help us to compare performance of various computers Beware of overly-simplified comparisons

Pitfalls and Fallacies Processors with the same ISA can be compared by clock rate or a single benchmark suite alone We don’t know the pipeline structure and memory system Peak performance tracks observed performance One processor may operate closer to peak performance most of the time than another MIPS is an accurate measure of performance

Example We wish to consider the performance of two different machines: M1 and M2. The clock frequencies for the two machines are as follows: M1 M2 Clock Frequency 300 MHz 200 MHz Two programs were run on both machines and the following measurements were made: Program Time on M1 Time on M seconds 04 seconds 2 08 seconds 10 seconds In addition, the following additional measurements were made: Program No. of Instructions No. of Instructions Executed on M1 Executed on M x10^6 100x10^6 1. For each program, which machine is faster and by how much? 2. Find the clock cycles per instruction (CPI or average CPI) for Program 1 on both machines 3. On M1, each multiplication instruction involves 20 clock cycles. Suppose 20% of the instructions in Program 1 running on M1 are multiplications. What percentage of the CPU time is spent doing multiplications during the execution of Program 1 on M1? 4. Find the instruction execution rate (i.e., the number of instructions executed per second) for each machine when running Program 1 5. Assuming the CPI for the machines is constant, find the instruction count for Program 2 running on each machine using the execution times.

Solution 1. For program 1, M2 is 2sec or (6-4)/6 = 33% faster For program 2, M1 is 2 sec or (10-8)/10 = 20% faster 2. t M1P1 = INSTR M1P1 x CPI M1P1 x 1/f M1 => CPI M1P1 = (t M1P1 x f M1 )/INST M1P1 = (6 x 300)/180 = 10 Likewise CPI M2 = (4 x 200)/100 = 8 3. INSTR MULTM1P1 = 0.2 x 180x10^6 = 36x10^6 instructions t MULTM1P1 = INSTR MULTM1P1 x 20 x 1/(300x10^6) = 720/300 = 2.4 sec t MULTM1P1/ t M1P1 = 2.4/6 = 40% 4. MIPS M1P1 = (INSTR M1P1 / t M1P1 )*10^6 = 180/6 = 30 MIPS M2P1 = (INSTR M2P1 / t M2P1 )*10^6 = 100/4 = t M1P2 = INSTR M1P2 x CPI M1P2 x 1/f M1 => INSTR M1P2 = (t M1P2 x f M1 )/ CPI M1P1 = (8 x 300x10^6)/10 = 240x10^6 INSTR M2P2 = (t M2P2 x f M2 )/ CPI M2P1 = (10 x 200x10^6)/8 = 250x10^6

Example

Review Questions Is CPI constant for a given processor (does not change from one program to another)? Two processors with the same Instruction Set Architecture have the same CPI True False Is MIPS constant for a given processor (does not change from one program to another)? Two processors with the same Instruction Set Architecture have the same MIPS True False

Review Questions Which of the following performance metrics is generally easier for the programmer to improve? The instruction count The average CPI The clock frequency peak MIPS What would you consider as most important when selecting the fastest processor for a certain application domain? The operating clock frequency MIPS Peak MIPS Execution time for relative benchmarks How can you increase a processor’s clock frequency? Write a better program Use a better compiler Implement the processor in a faster VLSI technology Use a larger memory

Example We wish to consider the performance of two different machines: M1 and M2. The clock frequencies for the two machines are as follows: M1 M2 Clock Frequency: 800 MHz 1000 MHz A program was run on both machines and the following measurements were made: Time on M1 Time on M2 2.5 seconds 2 seconds In addition, the following additional measurements were made: No. of Instructions Executed on M1 Executed on M2 100x10^6 125x10^6 Finally, the frequency that instructions occur in the program for M1 and M2 are shown in the following table InstructionM1%M2% ADD4060 MULT10 8 CMP2012 SUB Find the clock cycles per instruction (CPI or average CPI) for Program on both machines 2. How much faster will the program run on M1 and M2 respectively if we a) reduce the execution time of the ADD instruction by 20%, assuming that an ADD instruction requires 5 cycles on both machines b) reduce the execution time of the MULT instruction by 20%, assuming a MULT instructions requires 20 cycles on M1 and 25 cycles on M2 c) Which is better for M1 and which for M2?