204521 Digital System Architecture 1 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Lecture 2a Computer Performance and Cost Pradondet Nilagupta.

Slides:



Advertisements
Similar presentations
Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
Advertisements

Performance Evaluation of Architectures Vittorio Zaccaria.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
ECE 4100/6100 Advanced Computer Architecture Lecture 3 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute.
2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary
ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.
ECE 232 L4 perform.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 4 Performance,
Chapter 4 Assessing and Understanding Performance
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 3, 2003 Lecture 2.
CS430 – Computer Architecture Lecture - Introduction to Performance
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what.
1 Measuring Performance Chris Clack B261 Systems Architecture.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
Lecture 2: Computer Performance
CENG 450 Computer Systems & Architecture Lecture 3 Amirali Baniasadi
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
CS252/Patterson Lec 1.1 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic2: Technology Trend and Cost/Performance (Adapted from David A. Patterson’s CS252 lecture.
Cost and Performance.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
CpE 442 Introduction to Computer Architecture The Role of Performance
Lecture 2: Performance Evaluation
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
Performance Performance The CPU Performance Equation:
Defining Performance Which airplane has the best performance?
Computer Performance He said, to speed things up we need to squeeze the clock.
Performance of computer systems
August 30, 2000 Prof. John Kubiatowicz
Performance of computer systems
A Question to Ponder On [from last lecture]
Computer Performance Read Chapter 4
Computer Organization and Design Chapter 4
Presentation transcript:

Digital System Architecture 1 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Lecture 2a Computer Performance and Cost Pradondet Nilagupta (Based on lecture notes by Prof. Randy Katz & Prof. Jan M. Rabaey, UC Berkeley)

Digital System Architecture 2 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Review: What is Computer Architecture? Technology Applications Computer Architect Interfaces Machine Organization Measurement & Evaluation ISAAPI Link I/O Chan Regs IR

Digital System Architecture 3 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 The Architecture Process New concepts created Estimate Cost & Performance Sort Good ideas Mediocre ideas Bad ideas

Digital System Architecture 4 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Performance Measurement and Evaluation l Many dimensions to computer performance –CPU execution time l by instruction or sequence –floating point –integer –branch performance –Cache bandwidth –Main memory bandwidth –I/O performance l bandwidth l seeks l pixels or polygons per second l Relative importance depends on applications P $ M

Digital System Architecture 5 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Evaluation Tools l Benchmarks, traces, & mixes –macrobenchmarks & suites l application execution time –microbenchmarks l measure one aspect of performance –traces l replay recorded accesses –cache, branch, register l Simulation at many levels –ISA, cycle accurate, RTL, gate, circuit l trade fidelity for simulation rate l Area and delay estimation l Analysis –e.g., queuing theory MOVE39% BR20% LOAD20% STORE10% ALU11% LD 5EA3 ST 31FF …. LD 1EA2 ….

Digital System Architecture 6 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Benchmarks l Microbenchmarks –measure one performance dimension l cache bandwidth l main memory bandwidth l procedure call overhead l FP performance –weighted combination of microbenchmark performance is a good predictor of application performance –gives insight into the cause of performance bottlenecks l Macrobenchmarks –application execution time l measures overall performance, but on just one application Perf. Dimensions Applications Micro Macro

Digital System Architecture 7 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Some Warnings about Benchmarks l Benchmarks measure the whole system –application –compiler –operating system –architecture –implementation l Popular benchmarks typically reflect yesterday’s programs –computers need to be designed for tomorrow’s programs l Benchmark timings often very sensitive to –alignment in cache –location of data on disk –values of data l Benchmarks can lead to inbreeding or positive feedback –if you make an operation fast (slow) it will be used more (less) often l so you make it faster (slower) –and it gets used even more (less) and so on…

Digital System Architecture 8 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Architectural Performance Laws and Rules of Thumb

Digital System Architecture 9 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Measurement and Evaluation Architecture is an iterative process: Searching the space of possible designs Make selections Evaluate the selections made Good measurement tools are required to accurately evaluate the selection.

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Measurement Tools l Benchmarks, Traces, Mixes l Cost, delay, area, power estimation l Simulation (many levels) –ISA, RTL, Gate, Circuit l Queuing Theory l Rules of Thumb l Fundamental Laws

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Measuring and Reporting Performance l What do we mean by one Computer is faster than another? –program runs less time l Response time or execution time –time that users see the output l Throughput –total amount of work done in a given time

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Performance “Increasing and decreasing” ????? We use the term “improve performance” or “ improve execution time” When we mean increase performance and decrease execution time. improve performance = increase performance improve execution time = decrease execution time

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Measuring Performance Definition of time l Wall Clock time l Response time l Elapsed time –A latency to complete a task including disk accesses, memory accesses, I/O activities, operating system overhead

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Does Anybody Really Know What Time it is? l User CPU Time (Time spent in program) l System CPU Time (Time spent in OS) l Elapsed Time (Response Time = 159 Sec.) l ( )/159 * 100 = 65%, % of lapsed time that is CPU time. 45% of the time spent in I/O or running other programs UNIX Time Command: 90.7u 12.9s 2:39 65%

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 example UNIX time command 90.7u :39 65% user CPU time is 90.7 sec system CPU time is 12.9 sec elapsed time is 2 min 39 sec. (159 sec) % of elapsed time that is CPU time is = 65% 159

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Time CPU time –time the CPU is computing –not including the time waiting for I/O or running other program User CPU time –CPU time spent in the program System CPU time –CPU time spent in the operating system performing task requested by the program decrease execution time CPU time = User CPU time + System CPU time

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Performance System Performance –elapsed time on unloaded system CPU performance –user CPU time on an unloaded system

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Programs to Evaluate Processor Performance l Real Program l Kernel –Time critical excerpts l Toy benchmark – lines of code, Sieve of Erastasthens, Puzzles, Quick Sort l Synthetic benchmark –Attempt to match average frequencies of real workloads –similar to kernel –Whetstone, Dhrystone –not even pieces of real program, but kernel might be.

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Benchmark Suite l collecting of benchmark, try to measure the performance of processor with a variety of application –SPECint92, SPECfp92 –see fig. 1.9

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Benchmarking Games l Differing configurations used to run the same workload on two systems l Compiler wired to optimize the workload l Test specification written to be biased towards one machine l Synchronized CPU/IO intensive job sequence used l Workload arbitrarily picked l Very small benchmarks used l Benchmarks manually translated to optimize performance

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Common Benchmarking Mistakes (I) l Only average behavior represented in test workload l Skewness of device demands ignored l Loading level controlled inappropriately l Caching effects ignored l Buffer sizes not appropriate l Inaccuracies due to sampling ignored

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Common Benchmarking Mistakes (II) l Ignoring monitoring overhead l Not validating measurements l Not ensuring same initial conditions l Not measuring transient (cold start) performance l Using device utilizations for performance comparisons l Collecting too much data but doing too little analysis

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 SPEC: System Performance Evaluation Cooperative l First Round 1989 –10 programs yielding a single number l Second Round 1992 –SpecInt92 (6 integer programs) and SpecFP92 (14 floating point programs) –Compiler Flags unlimited. March 93 of DEC 4000 Model 610: spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)=memcpy (b,a,c)” wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200 nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas l Third Round 1995 –Single flag setting for all programs; new set of programs “benchmarks useful for 3 years”

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 SPEC: System Performance Evaluation Cooperative l First Round 1989 –10 programs yielding a single number (“SPECmarks”) l Second Round 1992 –SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs) l Compiler Flags unlimited. March 93 of DEC 4000 Model 610: spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)= memcpy(b,a,c)” wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200 nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 SPEC: System Performance Evaluation Cooperative l Third Round 1995 –new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) –“benchmarks useful for 3 years” –Single flag setting for all programs: SPECint_base95, SPECfp_base95

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 How to Summarize Performance Arithmetic mean (weighted arithmetic mean) tracks execution time:  (T i )/n or  (W i *T i ) Harmonic mean (weighted harmonic mean) of rates (e.g., MFLOPS) tracks execution time: n/  (1/R i ) or n/  (W i /R i ) l Normalized execution time is handy for scaling performance (e.g., X times faster than SPARCstation 10) –Arithmetic mean impacted by choice of reference machine Use the geometric mean for comparison:  (T i )^1/n –Independent of chosen machine –but not good metric for total execution time

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 SPEC First Round l One program: 99% of time in single line of code l New front-end compiler could improve dramatically

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Impact of Means on SPECmark89 for IBM 550 ( without and with special compiler option) Ratio to VAX: Time: Weighted Time: ProgramBeforeAfterBeforeAfterBeforeAfter gcc espresso spice doduc nasa li eqntott matrix fpppp tomcatv Mean Geometric Arithmetic Weighted Arith. Ratio1.33Ratio1.16Ratio1.09

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 The Bottom Line: Performance (and Cost) Time to run the task (ExTime) –Execution time, response time, latency Tasks per day, hour, week, sec, ns … (Performance) –Throughput, bandwidth Plane Boeing 747 BAD/Sud Concorde Speed 610 mph 1350 mph DC to Paris 6.5 hours 3 hours Passengers Throughput (pass/hr) 72 44

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 The Bottom Line: Performance (and Cost) "X is n times faster than Y" means ExTime(Y) Performance(X) = = n ExTime(X) Performance(Y) Performance is the reciprocal of execution time Speed of Concorde vs. Boeing 747 Throughput of Boeing 747 vs. Concorde

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Performance Terminology “X is n% faster than Y” means: ExTime(Y) Performance(X) n = = ExTime(X)Performance(Y) 100 n = 100(Performance(X) - Performance(Y)) Performance(Y) n = 100(ExTime(Y) - ExTime(X)) ExTime(X)

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Example Example: Y takes 15 seconds to complete a task, X takes 10 seconds. What % faster is X? n = 100(ExTime(Y) - ExTime(X)) ExTime(X) n = 100( ) 10 n = 50%

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Speedup Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = = ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction enhanced of the task by a factor Speedup enhanced, and the remainder of the task is unaffected, then what is ExTime(E) = ? Speedup(E) = ?

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Amdahl’s Law l States that the performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time faster mode can be used

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Amdahl’s Law ExTime new = ExTime old x (1 - Fraction enhanced ) + Fraction enhanced Speedup overall = ExTime old ExTime new Speedup enhanced = 1 (1 - Fraction enhanced ) + Fraction enhanced Speedup enhanced

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Example of Amdahl’s Law l Floating point instructions improved to run 2X; but only 10% of actual instructions are FP Speedup overall = ExTime new =

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Example of Amdahl’s Law l Floating point instructions improved to run 2X; but only 10% of actual instructions are FP Speedup overall = =1.053 ExTime new = ExTime old x ( /2) = 0.95 x ExTime old

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Corollary: Make The Common Case Fast l All instructions require an instruction fetch, only a fraction require a data fetch/store. –Optimize instruction access over data access l Programs exhibit locality –90% of time in 10% of code –Spatial Locality –Temporal Locality l Access to small memories is faster –Provide a storage hierarchy such that the most frequent accesses are to the smallest (closest) memories. Reg's Cache Memory Disk / Tape

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 RISC Philosophy l The simple case is usually the most frequent and the easiest to optimize! l Do simple, fast things in hardware and be sure the rest can be handled correctly in software

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Metrics of Performance Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s Cycles per second (clock rate) Megabytes per second Answers per month Operations per second

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 % of Instructions Responsible for 80-90% of Instructions Executed

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Aspects of CPU Performance CPU time= Seconds = Instructions x Cycles x Seconds (User) Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds (User) Program Program Instruction Cycle In str. Cnt CPI Clock Rate Program Compiler Instr. Set Organization Technology

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Aspects of CPU Performance CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPIClock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Marketing Metrics MIPS = Instruction Count / Time * 10^6 = Clock Rate / CPI * 10^6 Machines with different instruction sets ? Programs with different instruction mixes ? – Dynamic frequency of instructions Uncorrelated with performance MFLOP/s = FP Operations / Time * 10^6 Machine dependent Often not where time is spent Normalized: add,sub,compare,mult 1 divide, sqrt 4 exp, sin,... 8 Normalized: add,sub,compare,mult 1 divide, sqrt 4 exp, sin,... 8

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Cycles Per Instruction CPU time = CycleTime *  CPI i * I i i = 1 n Avg. CPI =  CPI * F where F = I i = 1 n i i ii Instruction Count “Instruction Frequency” Invest Resources where time is Spent! CPI = Clock Cycles for a Program/Instr. Count “Average Cycles per Instruction”

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Example: Calculating CPI Typical Mix Base Machine (Reg / Reg) OpFreqCyclesCPI(i)(% Time) ALU50%1.5(33%) Load20%2.4(27%) Store10%2.2(13%) Branch20%2.4(27%) is the Average CPI for this instruction mix

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Organizational Trade-offs Instruction Mix Cycle Time CPI Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Base Machine (Reg / Reg) OpFreqCycles ALU50%1 Load20%2 Store10%2 Branch20%2 Typical Mix Trade-off Example Add register / memory operations: – One source operand in memory – One source operand in register – Cycle count of 2 Branch cycle count to increase to 3. What fraction of the loads must be eliminated for this to pay off?

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Example Solution Exec Time = Instr Cnt x CPI x Clock OpFreqCycles ALU Load Store Branch Reg/Mem

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Example Solution Exec Time = Instr Cnt x CPI x Clock OpFreqCycles FreqCycles ALU – X 1.5 – X Load – X 2.4 – 2X Store Branch Reg/Mem X 22X – X (1.7 – X)/(1 – X) CPI New must be normalized to new instruction frequency Cycles New Instructions New

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Example Solution Exec Time = Instr Cnt x CPI x Clock OpFreqCycles FreqCycles ALU – X 1.5 – X Load – X 2.4 – 2X Store Branch Reg/Mem X 22X – X (1.7 – X)/(1 – X) Instr Cnt Old x CPI Old x Clock Old = Instr Cnt New x CPI new x Clock New 1.00 x 1.5 = (1 – X) x (1.7 – X)/(1 – X)

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Example Solution Exec Time = Instr Cnt x CPI x Clock OpFreqCycles FreqCycles ALU – X 1.5 – X Load – X 2.4 – 2X Store Branch Reg/Mem X 22X – X (1.7 – X)/(1 – X) Instr Cnt Old x CPI Old x Clock Old = Instr Cnt New x CPI New x Clock New 1.00 x 1.5 = (1 – X) x (1.7 – X)/(1 – X) 1.5 = 1.7 – X 0.2 = X ALL loads must be eliminated for this to be a win!

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 How to Summarize Performance l Arithmetic mean (weighted arithmetic mean) tracks execution time: SUM(Ti)/n or SUM(Wi*Ti) l Harmonic mean (weighted harmonic mean) of rates (e.g., MFLOPS) tracks execution time: n/SUM(1/Ri) or n/SUM(Wi/Ri) l Normalized execution time is handy for scaling performance l But do not take the arithmetic mean of normalized execution time, use the geometric mean (prod(Ri)^1/n)

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Means Arithmetic mean Geometric mean Harmonic mean Consistent independent of reference Can be weighted. Represents total execution time

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Which Machine is “Better”? Computer A Computer B Computer C Program P1(sec) Program P Total Time

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Weighted Arithmetic Mean Assume three weighting schemes P1/P2 Comp AComp BComp C.5/ / /

Digital System Architecture ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Performance Evaluation l Given sales is a function of performance relative to the competition, big investment in improving product as reported by performance summary l Good products created then have: –Good benchmarks –Good ways to summarize performance l If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins! l Execution time is the measure of performance