PerformanceCS510 Computer ArchitecturesLecture 3 - 1 Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.

Slides:



Advertisements
Similar presentations
Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
Advertisements

Performance Evaluation of Architectures Vittorio Zaccaria.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
ECE 4100/6100 Advanced Computer Architecture Lecture 3 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute.
2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.5 Comparing and Summarizing Performance.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.
ECE 232 L4 perform.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 4 Performance,
Chapter 4 Assessing and Understanding Performance
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 3, 2003 Lecture 2.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 1, 2004 Lecture 3 (continuation of Lecture 2)
CS430 – Computer Architecture Lecture - Introduction to Performance
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what.
1 Measuring Performance Chris Clack B261 Systems Architecture.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
Lecture 2: Computer Performance
CENG 450 Computer Systems & Architecture Lecture 3 Amirali Baniasadi
Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.
Memory/Storage Architecture Lab Computer Architecture Performance.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Digital System Architecture 1 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Lecture 2a Computer Performance and Cost Pradondet Nilagupta.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
CS252/Patterson Lec 1.1 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic2: Technology Trend and Cost/Performance (Adapted from David A. Patterson’s CS252 lecture.
EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction.
Cost and Performance.
CET Gannod1 Chapter 1 Fundamentals of Computer Design.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
Software School, Fudan University 2015 The Role of Performance To tell which system is faster.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
CpE 442 Introduction to Computer Architecture The Role of Performance
Computer Organization
Lecture 2: Performance Evaluation
4- Performance Analysis of Parallel Programs
September 2 Performance Read 3.1 through 3.4 for Tuesday
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
Performance Performance The CPU Performance Equation:
CSCE 212 Chapter 4: Assessing and Understanding Performance
Chapter 1 Computer Abstractions & Technology Performance Evaluation
August 30, 2000 Prof. John Kubiatowicz
A Question to Ponder On [from last lecture]
Presentation transcript:

PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics

PerformanceCS510 Computer ArchitecturesLecture Measurement Tools Benchmarks, Traces, Mixes Cost, Delay, Area, Power Estimation Simulation (many levels) –ISA, RT, Gate, Circuit Queuing Theory Rules of Thumb Fundamental Laws

PerformanceCS510 Computer ArchitecturesLecture The Bottom Line: Performance (and Cost) Time to run the task (ExTime) –Execution time, response time, latency Tasks per day, hour, week, sec, ns....(Performance) –Throughput, bandwidth 610 mph 1350 mph , , hours 3.0 hours Plane Boeing 747 BAD/Sud Concorde Speed Time (DC-Paris) Passengers Throughput (pmph)

PerformanceCS510 Computer ArchitecturesLecture The Bottom Line: Performance (and Cost) ExTime(Y) Performance(X) n = = ExTime(X) Performance(Y) “X is n times faster than Y” means:

PerformanceCS510 Computer ArchitecturesLecture Performance Terminology “X is n% faster than Y” means: 100 x (Performance(X) - Performance(Y)) 100 x (Performance(X) - Performance(Y)) n = n = Performance(Y) Performance(Y) ExTime(Y) Performance(X) n = = 1 + ExTime(X) Performance(Y) 100

PerformanceCS510 Computer ArchitecturesLecture Example = = Performance (X) Performance (Y) ExTime(Y) ExTime(X) =n= 100 ( ) 1.0 n=50% Example: Y takes 15 seconds to complete a task, X takes 10 seconds. X takes 10 seconds. What % faster is X? What % faster is X?

PerformanceCS510 Computer ArchitecturesLecture Programs to Evaluate Processor Performance (Toy) Benchmarks –10~100-line program –e.g.: sieve, puzzle, quicksort Synthetic Benchmarks –Attempt to match average frequencies of real workloads –e.g., Whetstone, dhrystone Kernels –Time critical excerpts of real programs –e.g., Livermore loops Real programs –e.g., gcc, spice

PerformanceCS510 Computer ArchitecturesLecture Benchmarking Games Differing configurations used to run the same workload on two systems Compiler wired to optimize the workload Workload arbitrarily picked Very small benchmarks used Benchmarks manually translated to optimize performance

PerformanceCS510 Computer ArchitecturesLecture Common Benchmarking Mistakes Only average behavior represented in test workload Ignoring monitoring overhead Not ensuring same initial conditions “Benchmark Engineering” –particular optimization –different compilers or preprocessors –runtime libraries

PerformanceCS510 Computer ArchitecturesLecture SPEC: System Performance Evaluation Cooperative First Round 1989 –10 programs yielding a single number Second Round 1992 –SpecInt92 (6 integer programs) and SpecFP92 (14 floating point programs) –VAX-11/780 Third Round 1995 –Single flag setting for all programs; new set of programs “benchmarks useful for 3 years” –SPARCstation 10 Model 40

PerformanceCS510 Computer ArchitecturesLecture SPEC First Round One program: 99% of time in single line of code New front-end compiler could improve dramatically

PerformanceCS510 Computer ArchitecturesLecture How to Summarize Performance Arithmetic Mean (weighted arithmetic mean) –tracks execution time:  (T i )/n or  W i *T i Harmonic Mean (weighted harmonic mean) of execution rates (e.g., MFLOPS) –tracks execution time: n/  1/R i or n/  W i /R i Normalized execution time is handy for scaling performance But do not take the arithmetic mean of normalized execution time, use the geometric mean  (R i ) 1/n, where R i =1/T i

PerformanceCS510 Computer ArchitecturesLecture Comparing and Summarizing Performance For program P1, A is 10 times faster than B, For program P2, B is 10 times faster than A, and so on... The relative performance of computer is unclear with Total Execution Times Computer A Computer B Computer C P1(secs) P2(secs) 1, Total time(secs) 1,

PerformanceCS510 Computer ArchitecturesLecture Summary Measure Arithmetic Mean n  Execution Time i i=1 1n1n n  1 / Rate i i=1 Rate i =ƒ(1 / Execcution Time i ) Good, if programs are run equally in the workload Harmonic Mean(When performance is expressed as rates)

PerformanceCS510 Computer ArchitecturesLecture Unequal Job Mix ­ Weighted Arithmetic Mean ­ Weighted Harmonic Mean n  Weight i x Execution Time i i=1 n  Weight i / Rate i i=1 Relative Performance Normalized Execution Time to a reference machine ­ Arithmetic Mean ­ Geometric Mean n  Execution Time Ratio i i=1 n Normalized to the reference machine Weighted Execution Time

PerformanceCS510 Computer ArchitecturesLecture Weighted Arithmetic Mean  W(i) j x Time j j=1 n WAM(i) = A B C W(1) W(2) W(3) P1 (secs) P2(secs) 1, x ,000 x 0.5 WAM(1) WAM(2) WAM(3)

PerformanceCS510 Computer ArchitecturesLecture Normalized Execution Time P P Normalized to ANormalized to BNormalized to C A B C A B C A B C Geometric Mean = n  Execution time ratio i I=1 n Arithmetic mean Geometric mean Total time A B C P P2 1,

PerformanceCS510 Computer ArchitecturesLecture Disadvantages of Arithmetic Mean Performance varies depending on the reference machine B is 5 times slower than A A is 5 times slower than B C is slowest C is fastest Normalized to ANormalized to BNormalized to C A B C A B C A B C P1 P2 Arithmetic mean

PerformanceCS510 Computer ArchitecturesLecture The Pros and Cons Of Geometric Means Independent of running times of the individual programs Independent of the reference machines Do not predict execution time –the performance of A and B is the same : only true when P1 ran 100 times for every occurrence of P2 P P (P1) x (P2) x 1 = 10(P1) x (P2) x 1 Geometric mean Normalized to ANormalized to BNormalized to C A B C A B C A B C

PerformanceCS510 Computer ArchitecturesLecture

PerformanceCS510 Computer ArchitecturesLecture

PerformanceCS510 Computer ArchitecturesLecture Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/E Speedup(E) = = ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then: ExTime(E) = Speedup(E) =

PerformanceCS510 Computer ArchitecturesLecture Amdahl’s Law Speedup = ExTime ExTime E = 1 (1 - Fraction E ) + Fraction E Speedup E ExTime E = ExTime x (1 - Fraction E ) + Speedup E Fraction E 1 (1 - F) + F/S =

PerformanceCS510 Computer ArchitecturesLecture Amdahl’s Law Floating point instructions are improved to run 2 times(100% improvement); but only 10% of actual instructions are FP Speedup = 1 (1-F) + F/S = % improvement 1 (1-0.1) + 0.1/ = 1 =

PerformanceCS510 Computer ArchitecturesLecture Corollary(Amdahl): Make the Common Case Fast All instructions require an instruction fetch, only a fraction require a data fetch/store –Optimize instruction access over data access Programs exhibit locality Spatial Locality Reg’s Cache Memory Disk / Tape Temporal Locality Access to small memories is faster Provide a storage hierarchy such that the most frequent accesses are to the smallest (closest) memories.

PerformanceCS510 Computer ArchitecturesLecture Locality of Access Spatial Locality: There is a high probability that a set of data, whose address differences are small, will be accessed in small time difference. Temporal Locality: There is a high probability that the recently referenced data will be referenced in near future.

PerformanceCS510 Computer ArchitecturesLecture The simple case is usually the most frequent and the easiest to optimize! Do simple, fast things in hardware(faster) and be sure the rest can be handled correctly in software Rule of Thumb

PerformanceCS510 Computer ArchitecturesLecture Metrics of Performance Compiler Programming Language Application ISA Datapath Control TransistorsWiresPins Function Units (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s Answers per month Operations per second Cycles per second (clock rate) Megabytes per second

PerformanceCS510 Computer ArchitecturesLecture Aspects of CPU Performance Seconds Instructions Cycles Seconds CPU time= = x x Program Program Instruction Cycle Program X Compiler X (X) Inst. Set. X X Organization X Technology X X Inst Count CPI Clock Rate

PerformanceCS510 Computer ArchitecturesLecture Marketing Metrics MIPS = Instruction Count / Time x 10 6 = Clock Rate / CPI x 10 6 Machines with different instruction sets ? Programs with different instruction mixes ? – Dynamic frequency of instructions Not correlated with performance MFLOP/s = FP Operations / Time x 10 6 Machine dependent Often not where time is spent Normalized: add,sub,compare, mult 1 divide, sqrt 4 exp, sin,... 8 Normalized: add,sub,compare, mult 1 divide, sqrt 4 exp, sin,... 8

PerformanceCS510 Computer ArchitecturesLecture Cycles Per Instruction CPU time = Cycle Time x  CPI x I i = 1 n i i Instruction Frequency CPI =  CPI x F,where F = I i = 1 n ii ii Instruction Count CPI = (CPU Time x Clock Rate) / Instruction Count = Cycles / Instruction Count Average cycles per instruction Invest resources where time is spent !

PerformanceCS510 Computer ArchitecturesLecture Organizational Trade-offs Instruction Mix Cycle Time CPI ISA Datapath Control TransistorsWiresPins Function Units Compiler Programming Language Application

PerformanceCS510 Computer ArchitecturesLecture Example: Calculating CPI Typical Mix Base Machine (Reg / Reg) OpFreqCPI(i)CPI(% Time) ALU50%1.5 (33%) Load20%2.4 (27%) Store10%2.2 (13%) Branch20%2.4 (27%) 1.5

PerformanceCS510 Computer ArchitecturesLecture Example Add register / memory operations: R/M – One source operand in memory – One source operand in register – Cycle count of 2 Branch cycle count to increase to 3 What fraction of the loads must be eliminated for this to pay off? Base Machine (Reg / Reg) Some of LD instructions can be eliminated by having R/M type ADD instruction [ADD R1, X] Typical Mix OpFreq i CPI i ALU50% 1 Load20% 2 Store10% 2 Branch20% 2

PerformanceCS510 Computer ArchitecturesLecture Example Solution Exec Time = Instr Cnt x CPI x Clock OpFreq i CPI i CPI ALU Load Store Branch Total

PerformanceCS510 Computer ArchitecturesLecture Example Solution Exec Time = Instr Cnt x CPI x Clock CPI NEW must be normalized to new instruction frequency New Freq i CPI i CPI NEW.5 - X X.2 - X X X 22X 1 - X (1.7 - X)/(1 - X) Old OpFreq i CPI i CPI ALU Load Store Branch Reg/Mem

PerformanceCS510 Computer ArchitecturesLecture Example Solution Exec Time = Instr Cnt x CPI x Clock All LOADs must be eliminated for this to be a win ! 1.00 x 1.5 = (1 - X) x (1.7 - X)/(1 - X) 1.5 = X 0.2 = X Instr Cnt Old x CPI Old x Clock = Instr Cnt New x CPI New x Clock Op Freq Cycles CPI Old FreqCycles CPI NEW ALU X X Load X X Store Branch Reg/MemX 22X X (1.7 - X)/(1 - X) Old New

PerformanceCS510 Computer ArchitecturesLecture Fallacies and Pitfalls MIPS is an accurate measure for comparing performance among computers –dependent on the instruction set –varies between programs on the same computer –can vary inversely to performance MFLOPS is a consistent and useful measure of performance –dependent on the machine and on the program –not applicable outside the floating-point performance –the set of floating-point operations is not consistent across the machines