performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998 These slides are base on the chapter 2 from the following book: D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998chapter 2 If you need more explanations you can find them in the book itself. Here is the list of the relevant slides numbers (from the chapter 2 slides): 11 – 14, 18 – 22, 28 – 30. The slides contain some examples (without solutions). We would solve some of them in the class.
We would focus on user CPU time – time spent executing the lines of code that are “in” our program (i.e. without I/O time, etc). Definition of performance: for some program running of machine X, Performance x = 1 / Execution time x Note that “machine X is n time faster than machine Y” => P x / P y = n Clock cycle: time between 2 consequent (machine) clock ticks. Instead of reporting execution time in seconds, we often use cycles. Clock rate (frequency) = cycles per second. ( 1 Hz = 1 cycle/sec) Example: Machine with 200 Mhz clock has 200 * 10 6 Hz => it produces 2*10 8 clock cycles per second => its cycle (time) is 1/ 2*10 8 = 5 nanoseconds. (nanosecond = seconds). Note: different (machine) instructions take different amount of clock cycles. e.g.: integers floating points; memory access register access, etc.
Problem: Some program runs in 10 seconds on computer A, which has a 400 Mhz. clock. We built a new machine B, which runs in 600MHZ, but this machine requires each instruction 1.2 times as many clock cycles as machine A. How much time would it take machine B to execute the same program? Solution: clock rate = cycles per second 400 Mhz = 4*10 8 Hz => machine A provides 4*10 8 cycles per second program runs 10 seconds on machine A => program execution takes 4*10 9 cycles = > on machine B it would take 1.2 * 4*10 9 = 4.8 *10 9 cycles. How much time would it run on machine B? 4.8 *10 9 / 6 *10 8 Hz = 8, or 8 seconds.
Problem: There are two different classes of instructions: A and B - machine A has a clock cycle time of 10 ns. (nanoseconds) and a CPI (cycles per instruction) of 2.0 for class A instruction, CPI of 1.5 for class B instructions. - machine B has a clock cycle time of 20 ns. and a CPI of 1.25 for both instructions classes. a given program is 50% class A instructions and 50% class B instructions which machine runs this program faster? Solution: machine A: ns. per class A instruction = 2.0 * 10 = 20. machine A: ns. per class B instruction = 3.0 * 10 = 30. machine B: ns. per instruction = 1.25 * 20 = 25. execution time on machine A: C * (0.5 * * 30) = C * 25. execution time on machine B: C * 1*25 = C * 25. => the machines have same performance for the given program
Problem: There are three different classes of instructions: class A, B and C. They require one, three and five cycles respectively. There are two code sequences: - first code contains: 1 instructions of class A, 2 of B, and 1 of C. - second code contains: 6 instructions of class A, 1 of B, and 1 of C. A)Which sequence will be faster? B) By how much? C) What is the CPI for each sequence? Solution: first code: 1*1+2*3+1*5 = 12 cycles => CPI = 12 / (1+2+1) = 3 second code: 6*1+1*3+1*5 = 14 cycles => CPI = 14 / (6+1+1) = 1.75 A)first code is faster. B)By 14/12. C)3 for first code, 1.75 for second code
Amdahl’s Law: e.t. after improvement = e.t. unaffected + (e.t. affected / amount of improvement) (e.t. = execution time) Problem: A program runs in 100 seconds, with multiply (instructions) responsible for 80 seconds of this time. ( i.e. a program spends 80 seconds for execution of multiply instructions ). How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? How about making it 5 times faster? Solution: e.t. after improvement = 20 seconds + 80 seconds / x => 100 / 4 = / x => x = 16 This means that multiplication should be executed 16 time faster! Now, to make run time 5 times faster: 100 / 5 = / x => x = !!! This means that the multiplication should take 0 time! That’s impossible.
Problem: Suppose we want to improve in a well known benchmark, we know that floating- point instructions are 70% of the benchmark, and benchmark runs for 20 seconds, we enhanced the machine making all floating-point instructions run 7 times faster, but for some reason, this caused rest of the instructions run double the time. what will the speedup be? Floating point instructions run for 14 seconds, the rest 6 seconds. Solution: e.t. after improvement = 6*2 seconds + 14 / 7 = 12+2 = 14 seconds => speedup = 20 / 14. Summary: - performance is specific to a particular program(s). Total execution time is a consistent summary of performance. - for a given architecture, performance increases come from: - increases in clock rate (without adverse CPI affects) - improvements in processor organization that lower CPI - compiler enhancements that lower CPI and / or instruction count Pitfall: expecting improvement in one aspect of a machine’s performance to affect the total performance.