Download presentation
Presentation is loading. Please wait.
1
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette.edu ECE 313 - Computer Organization Performance Feb 2005 Reading: 4.1-4.6, 4.7* Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s 18-347 Slides - Fall 1999 CMU other sources as noted
2
Feb 2005Performance2 Roadmap for the term: major topics Computer Systems Overview Technology Trends Instruction sets (and Software) Logic & Arithmetic Performance Processor Implementation Memory Systems Input/Output
3
Feb 2005Performance3 Performance Outline Motivation Defining Performance Common Performance Metrics Benchmarks Amdahl’s Law
4
Feb 2005Performance4 Goal: Learn to “measure, report and summarize” performance of a computer system Why study performance? To make intelligent decisions when choosing a system To make intelligent decisions when designing a system Understand impact of implementation decisions Challenges How do we measure performance accurately? How do we compare performance fairly?
5
Feb 2005Performance5 What’s a good measure of performance? Execution Time (a.k.a. response time, latency) How long it takes to complete a single task Example: “how long does it take to rip an MP3 file?” Throughput How many tasks are completed per unit time Example: “how many MP3 files can I rip per hour? The measure we use depends on the application
6
Feb 2005Performance6 Execution Time vs. Throughput Analogy: passenger airplanes (book Figure 4.1) Concorde - fastest “response time” for an individual user Boeing 747 - highest passenger throughput
7
Feb 2005Performance7 Performance Outline Motivation Defining Performance Common Performance Metrics Benchmarks Amdahl’s Law
8
Feb 2005Performance8 Measuring Execution Time Wall-clock time or elapsed time Includes I/O waiting Includes time while OS runs other jobs CPU time - measured by OS User time - time spent in user program during execution System time - time spent in OS during execution Measuring CPU time: Unix/Linux time command % time myprog … 90.7u 12.9s 2:39 65% User Time System TimeWall-clock Time CPU Utilization
9
Feb 2005Performance9 Defining Performance using Execution Time For a given program on machine X: Comparing performance of machines: Performance X > Performance Y if Execution Time X < Execution Time Y
10
Feb 2005Performance10 Defining Performance (cont’d) We say “X is n times faster than Y” if: Use this definition to compare performance in homework & exam problems!
11
Feb 2005Performance11 Example - Performance Which machine is faster? By how much? B A A
12
Feb 2005Performance12 Performance Outline Motivation Defining Performance Common Performance Metrics Benchmarks Amdahl’s Law
13
Feb 2005Performance13 Clocks and Performance Every processor has a clock Clock frequency - MHz (or GHz) Clock cycle time / period - ns (or ps) How do we relate clock to program performance? Cycle time t clock = 2 ns f clk = 500 MHz
14
Feb 2005Performance14 Example - Clock Cycles How many clock cycles does A execute? How many clock cycles does B execute?
15
Feb 2005Performance15 Review - Clocks in Sequential Circuits Controls sequential circuit operation Register outputs change at beginning of cycle Combinational logic determines “next state” Storage elements store new state Adder Mux Combinational LogicRegister Output Register Input Clock
16
Feb 2005Performance16 What Limits Clock Frequency? Propagation delay - t prop Logic (including register outputs) Interconnect Register setup time - t setup Adder Mux Combinational Logic Register Output Register Input Clock t prop t setup t clock > t prop + t setup t clock = t prop + t setup + t slack
17
Feb 2005Performance17 Clock Cycles per Instruction (CPI) Consider the 68HC11 … ADDA - 3 cycles (IMM) -> 5 cycles (IND, Y) MUL - 10 cycles IDIV - 41 cycles More complex processors have other issues… Pipelining - parallel execution, but sometimes stalls Memory system issues: cache misses, page faults, etc. How can we combine these into an overall metric? addldamulandsta Total Execution Time
18
Feb 2005Performance18 Definition: Clock Cycles per Instruction (CPI) Average number of clock cycles per instruction Measured for an entire program
19
Feb 2005Performance19 Example - CPI What is the CPI of A? What is the CPI of B?
20
Feb 2005Performance20 Definition - MIPS MIPS - millions of instructions per second Once used as a general metric for performance But, not useful for comparing different architectures Often ridiculed as “meaningless indicator of performance”
21
Feb 2005Performance21 Example - MIPS What is the MIPS of A? What is the MIPS of B?
22
Feb 2005Performance22 Relating the Metrics - The Performance Equation The “Iron Law” of Performance
23
Feb 2005Performance23 Clock Cycles and Performance - Example Program runs on Computer A: CPU Time: 10 seconds Clock: 400MHz Computer B can run clock faster But, requires 1.2X clock cycles to perform same task Desired CPU Time: 6 Seconds What should the clock frequency be to reach this target? Key to approach: Performance equation
24
Feb 2005Performance24 Clock Cycles and Performance - Example (cont’d) First step: find clock cycles executed by Computer A Second step: find clock cycles executed by Computer B
25
Feb 2005Performance25 Clock Cycles and Performance - Example (cont’d) Third step: given clock cycles and CPU time, solve for clock rate of Computer B
26
Feb 2005Performance26 Performance Tradeoffs Program Instruction count - impacted by Instructions available to perform basic tasks (architecture) Quality of compiler code generator CPI - impacted by Chip implementation Quality of compiler code generator Memory system performance Clock Rate Delay characteristics of IC technology Logical structure of implementation
27
Feb 2005Performance27 Performance Outline Motivation Defining Performance Common Performance Metrics Benchmarks Amdahl’s Law
28
Feb 2005Performance28 Benchmarks - Programs to Evaluate Performance The book defines performance in terms of a specific program But, which program should you use? Ideally, “real” programs Ideally, programs you will use But what if you don’t know or don’t have time to find out? Alternative - Benchmark Suites
29
Feb 2005Performance29 Benchmark Suites Use a collection of small programs Summarize performance … how? Total Execution Time Arithmetic Mean Weighted Arithmetic Mean Geometric Mean
30
Feb 2005Performance30 Total Execution Time Suppose we have two benchmarks (Fig. 2.5) How do we compare computers A and B? A is 10 times faster than B for Program 1 B is 10 times faster than A for Program 2 Summarizing performance: total execution time Reasonable comparison for even workload
31
Feb 2005Performance31 Summarizing Performance Arithmetic Mean Example: Weighted Arithmetic Mean - use when some programs run more often than others Example: Program 1 is 80% of workload Program 2 is 20% of workload
32
Feb 2005Performance32 The SPEC Benchmark Suite System Performance Evaluation Corporation Founded by workstation vendors with goal of realistic, standardized performance test Philosophy: fair comparison between real systems Multiple versions - most recent is SPEC2000 Basic approach Measure execution time of several small programs Normalize each to performance on a reference machine Combine performances using geometric mean
33
Feb 2005Performance33 More about SPEC Separate integer & floating point suites SPECint - integer performance SPECfp - floating point performance Several Versions SPEC89 - initial version SPEC95 - see Figure 2.6 in book SPEC CPU 2000 - current CINT 2000 CFP 2000
34
Feb 2005Performance34 Some Not-so-Recent SPEC Data Source: Berkeley CPU Information Center Some SPEC95 numbers: SPEC2000 Results - see http://www.spec.org
35
Feb 2005Performance35 SPEC CINT2000 Benchmarks CINT2000 (Integer Component of SPEC CPU2000): BenchmarkLang.Category 164.gzipCCompression 175.vprCFPGA Circuit Placement and Routing 176.gccCC Programming Language Compiler 181.mcfCCombinatorial Optimization 186.craftyCGame Playing: Chess 197.parserCWord Processing 252.eonC++Computer Visualization 253.perlbmkCPERL Programming Language 254.gapCGroup Theory, Interpreter 255.vortexCObject-oriented Database 256.bzip2CCompression 300.twolfCPlace and Route Simulator
36
Feb 2005Performance36 SPEC CFP2000 Benchmarks CFP2000 (Floating Point Component of SPEC CPU2000): BenchmarkLanguageCategory 168.wupwiseFortran 77Physics / Quantum Chromodynamics 171.swimFortran 77Shallow Water Modeling 172.mgridFortran 77Multi-grid Solver: 3D Potential Field 173.appluFortran 77Parabolic / Elliptic Partial Diff. Equations 177.mesaC3-D Graphics Library 178.galgelFortran 90Computational Fluid Dynamics 179.artCImage Recognition / Neural Networks 183.equakeCSeismic Wave Propagation Simulation 187.facerecFortran 90Image Processing: Face Recognition 188.ammpCComputational Chemistry 189.lucasFortran 90Number Theory / Primality Testing 191.fma3dFortran 90Finite-element Crash Simulation 200.sixtrackFortran 77High Energy Nuclear Physics Accelerator Design301.apsiFortran 77Meteorology: Pollutant Distribution
37
Feb 2005Performance37 Some Other SPEC2000 Benchmarks SPECapc - application specific (e.g. PROEngineer) SPECviewperf - 3D rendering under OpenGL SPEC HPC - high performance computing SPEC OMP - multiprocessor systems SPEC JVM - Java virtual machine SPEC Web - Web services
38
Feb 2005Performance38 Benchmark Pitfalls Vendors sometimes focus on making specific benchmarks fast Example: tuning compiler (Old Figure 2.3)
39
Feb 2005Performance39 Performance Outline Motivation Defining Performance Common Performance Metrics Benchmarks Amdahl’s Law
40
Feb 2005Performance40 Amdahl’s Law Improving one part of performance by a factor of N doesn’t increase overall performance by N Book example: Suppose a program executes in 100 seconds where: 80 seconds are spent performing multiply operations 20 seconds are spent performing other operations What happens if we speed up multipy n times? Multiply - 80 secOther - 20 sec Multiply - 40 secOther - 20 sec
41
Feb 2005Performance41 Amdahl’s Law Example (cont’d) Execution time after speeding up multiply: Bottom line: no matter what we do to multiply, execution time will always be >20 seconds! Speedup factor of 5 is not possible!
42
Feb 2005Performance42 Amdahl’s Law Corollary Make the common case fast In our example, biggest gains when we speed up multiply Speeding up “other instructions” is not as valuable Multiply - 80 secOther - 20 secMultiply - 80 secOther - 15secMultiply - 60 secOther - 20 sec
43
Feb 2005Performance43 Roadmap for the term: major topics Computer Systems Overview Technology Trends Instruction sets (and Software) Logic & Arithmetic Performance Processor Implementation Memory Systems Input/Output
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.