Download presentation
Presentation is loading. Please wait.
Published byDamon Francis Modified over 9 years ago
1
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of architectural choices
2
2 Performance and Cost: Purchasing vs Design Views Our goal is to understand cost & performance implications of architectural choices. Consider 2 views: –Purchasing perspective: given 4 machines, which measure yields the best decision? best performance least cost best performance / cost –Design perspective: select the design that yields best performance least cost best performance / cost Both require –basis for comparison –metric for evaluation
3
3 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation Why is some hardware better than others for different programs? What factors of system performance are hardware related? (e.g., Do we need a new machine, or a new operating system?) How does the machine's instruction set affect performance? Performance
4
4 Which of these determines performance? –# of cycles to execute program? –# of instructions in program? –# of cycles per second? –average # of cycles per instruction? –average # of instructions per second? Common pitfall: thinking one of the variables is indicative of performance when it really isn’t. Performance is determined by execution time 1998 Morgan Kaufmann Publishers
5
5 Two notions of “performance” ° Which is the best measure of performance? Passenger capacity, range, speed, throughput, travel time Which has higher performance? Speed: Concorde is fastest; Range: Douglas DC-8-50 is longest AirplanePassengersRange (mi)Speed (mph) Boeing 7773754630610 Boeing 7474704150610 BAC/Sud Concorde13240001350 Douglas DC-8-501468720544
6
6 Two notions of “Performance” - 2 Throughput (passenger-milesperhour): Boeing 747 is highest Cost of operation vs Cost of operation per passenger-miles per hour? Plane Boeing 747 BAD/Sud Concodre Speed 610 mph 1350 mph DC to Paris 6.5 hours 3 hours Passengers 470 132 Throughput (pmph) 286,700 178,200 CPU: Time to do the task – execution time, response time, latency Tasks per day, hour, week, sec, ns.. (Capacity) – throughput, bandwidth Response time and throughput are often in opposition
7
7 How to Measure Performance Two approaches –User perspective: Response time / Execution time –Computer Center Manager perspective: Throughput based on number of jobs completed How quickly is each job completed vs Total Amount of work done We focus on execution time for a single job
8
8 Measures Performance is inversely proportional to the execution time. Performance x = 1/execution time x –Execution time decreases by 4 implies that performance has increased by 4. “x is n times faster than y” means that Performance x / Performance y = n
9
9 Measuring Execution Time Elapsed Time –includes everything (disk and memory accesses, I/O, etc.) –a useful number, but often not good for CPU assessment CPU time –doesn't include I/O or time spent running other programs –CPU time = system time + user time Our focus: user CPU time –time spent executing the lines of code that are "in" our program
10
10 Clock Cycles Instead of reporting execution time in seconds, we often use cycles Clock “ticks” indicate when to start activities (one abstraction): cycle time = time between ticks = seconds per cycle clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec) A 200 Mhz. clock has a cycle time time Morgan Kaufmann Publishers
11
11 Performance can be enhanced by either: ________ the # of required cycles for a program, or ________ the clock cycle time or, ________ the clock rate (inverse of clock cycle time). How to Improve Performance?
12
12 Is it safe to assume that # of cycles = # of instructions? This assumption is incorrect. Different instructions take different amounts of time. For example: add, lw in MIPS. Why do some instructions take more time than others? Remember – these are not lines of C code – these are machine instructions. time 1st instruction2nd instruction3rd instruction4th 5th6th... How many cycles are required for a program?
13
13 Multiplication/division takes more time than addition Floating point operations take longer than integer ones Accessing memory takes more time than accessing registers Cycles to execute an instruction can be different for different machines. In the same family of computers – cycles to execute an instruction can be different Important point: changing the cycle time often changes the number of cycles required for various instructions (more later) time Different numbers of cycles for different instructions
14
14 Our favorite program runs in 10 seconds on computer A, which has a 400 Mhz. clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should we tell the designer to target?" Don't Panic, can easily work this out from basic principles Example Morgan Kaufmann Publishers
15
15 Example Clock Cycles A = CPU time A * Clock Rate = 10 s * 400 * 10 6 c/s = 4 *10 9 1.2 * Clock Cycles A 1.2 * 4 *10 9 cycles Clock Rate B = ------------------------- = -------------------------- CPU time B 6 seconds Clock Rate B = 800 cycles per second = 800 MHz Execution time from 10 s to 6s Clock from 400 MHz to 800 MHz.
16
16 Factors Affecting Computer Performance CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle
17
17 CPI CPU time = ClockPeriod * CPI i * C i ClockPeriod = ClockCycleTime i = 1 n CPI = CPI * F where F i = C i i = 1 n i i Instruction Count "instruction frequency" Invest Resources where time is Spent! CPI = Clock Cycles to execute program / Instruction Count = (CPU Time * Clock Rate) / Instruction Count “Average cycles per instruction” CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle CPI i = CPI for instr class i C i = Count of instr class i instructions executed
18
18 CPI Example For a program, Machine A has a clock cycle time of 10 ns. and a CPI of 2.0 Machine B has a clock cycle time of 20 ns. and a CPI of 1.2 For machine A CPU time = IC CPI Clock cycle time CPU time = IC 2.0 10 ns = 20 IC ns For machine B CPU time = IC 1.2 20 ns = 24 IC ns
19
19 A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively). The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. Which sequence will be faster? How much? What is the CPI for each sequence? # of Instructions Example
20
20 A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively). The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. Which sequence will be faster? How much? What is the CPI for each sequence? # of Instructions Example
21
21 Example – Instruction Mix / CPI High speed cache reduces load to 2 cycles What consumes the most CPU time? Branch prediction reduces branch to 1 cycle Two ALU instructions per cycle
22
22 Amdahl's Law Compute Task = Component that can be parallelized + Component that is serial Speedup due to enhancement E (impacts parallelizable part): ExTime w/o E Performance with E Speedup(E) = -------------------- = ------------------------- ExTime with E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S and the remainder of the task is unaffected then what is the speedup? ExTime(without E) = ((1-F) + F) X ExTime(without E) ExTime(with E) = ((1-F) + F/S) X ExTime(without E) Speedup(with E) = 1 (1-F) + F/S FF/S
23
23 Summary: Instruction set design (MIPS) Use general purpose registers with a load-store architecture: YES Provide at least 16 general purpose registers plus separate floating-point registers: 31 GPR & 32 FPR Support basic addressing modes: displacement (with an address offset size of 12 to 16 bits), immediate (size 8 to 16 bits), and register deferred; : YES: 16 bits for immediate, displacement (disp=0 => register deferred) All addressing modes apply to all data transfer instructions : YES Use fixed instruction encoding if interested in performance and use variable instruction encoding if interested in code size : Fixed Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32- bit and 64-bit IEEE 754 floating point numbers: YES Support these simple instructions, since they will dominate the number of instructions executed: load, store, add, subtract, move register- register, and, shift, compare equal, compare not equal, branch (with a PC-relative address at least 8-bits long), jump, call, and return: YES Aim for a minimalist instruction set: YES
24
24 How to Evaluate Instruction Sets? Metric we use : Time to execute the program NOTE: this depends on instructions set, processor organization, and compilation techniques. CPI Instruction Count Cycle Time
25
25 Some Popular Performance Measures MIPS = Millions of Instructions per second –Easy to understand; faster machines have higher MIPS Problems –Different computers have different instruction sets. How does one compare MIPS across platforms –MIPS based on specific programs –MIPS depends on compilers MIPS are not a function of the CPU alone BENCHMARKS – BASIS FOR COMPARISON
26
26 MFLOPS Millions of floating point operations per second Problems –Different machines have different set of floating point operations –Programs require a varying mix of floating point operations.
27
27 How to evaluate? Target workload –Depends on user environment –Depends on “current” usage pattern Standard workloads (benchmarks) –Should not be narrow – manufacturers design to benchmark – design appropriate instruction sets!!! SPEC 2000 –A mix of tasks –CINT2000 – integer –CFP2000 – floating point SPECweb 99 –Focus on webserver thruput
28
28 SPEC ratings for Pentium 3 and Pentium 4 Fig 4.6 in 3 rd Edition
29
29 Relative Performance of 3 Intel processors Fig 4.8 in 3 rd Edition
30
30 Relative Energy Efficiency Fig 4.9 in 3 rd Edition
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.