Solutions Chapter 1.

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

Computer Organization Lab 1 Soufiane berouel. Formulas to Remember CPU Time = CPU Clock Cycles x Clock Cycle Time CPU Clock Cycles = Instruction Count.
CS1104: Computer Organisation School of Computing National University of Singapore.
Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
Understanding Performance Metrics of Processors Bina Ramamurthy Chapter 1.
Performance Evaluation of Architectures Vittorio Zaccaria.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
Computer Performance CS350 Term Project-Spring 2001 Elizabeth Cramer Bryan Driskell Yassaman Shayesteh.
Computer Organization and Architecture 18 th March, 2008.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
CIS429.S00: Lec3 - 1 CPU Time Analysis Terminology IC = instruction count = number of instructions in the program CPI = cycles per instruction (varies.
ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.
CSCE 212 Quiz 4 – 2/16/11 *Assume computes take 1 clock cycle, loads and stores take 10 cycles and branches take 4 cycles and that they are running on.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
9/16/2004Comp 120 Fall September 16 Assignment 4 due date pushed back to 23 rd, better start anywayAssignment 4 due date pushed back to 23 rd, better.
Computer Architecture Lecture 2 Instruction Set Principles.
Chapter 4 Assessing and Understanding Performance
Lecture 3: Computer Performance
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 Measuring Performance Chris Clack B261 Systems Architecture.
Chapter 1 Section 1.4 Dr. Iyad F. Jafar Evaluating Performance.
Operation Frequency No. of Clock cycles ALU ops % 1 Loads 25% 2
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
Performance David Monismith Jan. 16, 2015 Based on notes from Dr. Bill Siever and from the Patterson and Hennessy Text.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Performance.
CDA 3101 Discussion Section 09 CPU Performance. Question 1 Suppose you wish to run a program P with 7.5 * 10 9 instructions on a 5GHz machine with a CPI.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
Computer Performance Computer Engineering Department.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Ch4a- 2 EE/CS/CPE Computer Organization  Seattle Pacific University Performance What differences do we see in performance? Almost all computers.
Morgan Kaufmann Publishers
CPU Performance using Different Parameters CS 250: Andrei D. Coronel, MS,CEH,PhD Cand.
Performance Enhancement. Performance Enhancement Calculations: Amdahl's Law The performance enhancement possible due to a given design improvement is.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
EGRE 426 Computer Organization and Design Chapter 4.
Computer Engineering Rabie A. Ramadan Lecture 2. Table of Contents 2 Architecture Development and Styles Performance Measures Amdahl’s Law.
Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Performance 9 ways to fool the public #1 – Reporting Results.
Additional Examples CSE420/598, Fall 2008.
CpE 442 Introduction to Computer Architecture The Role of Performance
Compilers can have a profound impact on the performance of an application on given a processor. This problem will explore the impact compilers have on.
COSC6385 Advanced Computer Architecture
CSCI206 - Computer Organization & Programming
Lecture 2: Performance Evaluation
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
Defining Performance Which airplane has the best performance?
Lecture: Pipelining Basics
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
CSCI206 - Computer Organization & Programming
CMSC 611: Advanced Computer Architecture
September 24 Test 1 review More programming
1.4.2 [5] <1.4> What is the global CPI for each implementation?
CMSC 611: Advanced Computer Architecture
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Performance.
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

Solutions Chapter 1

Exercise 1.5 Consider two different implementations,P1 and P2, of the same instruction set. There are five classes of instructions(A,B,C,D, and E) in the instruction set. The clock rate and CPI of each class is given below. Clock Rate CPI Class A Class B Class C Class D Class E a P1 2.0GHz 1 2 3 4 P2 4.0GHz b 3.0GHz

a. P1: 2 × 109 inst/sec, P2: 2 × 109 inst/sec 1.5.1 Assume that peak performance is defined as the fastest rate that a computer can execute any instruction sequence. What are the peak performances of P1 and P2 expressed in instructions per second? Solution: a. P1: 2 × 109 inst/sec, P2: 2 × 109 inst/sec b. P1: 2 × 109 inst/sec, P2: 3 × 109 inst/sec Clock Rate CPI Class A Class B Class C Class D Class E a P1 2.0GHz 1 2 3 4 P2 4.0GHz b 3.0GHz

a. T(P1)/T(P2) =(1*2+2+3+4+3)4/(2*2+2+2+4+4)2 =14*2/16= 7/4; 1.5.2 If the number of instructions executed in a certain program is divided equally among the classes of instructions except for class A, which occurs twice as often as each of others, which computer is faster? How much faster is it? Solution: a. T(P1)/T(P2) =(1*2+2+3+4+3)4/(2*2+2+2+4+4)2 =14*2/16= 7/4; P2 is 1.75 times faster than P1 b. T(P2)/T(P1 )= 4.66/5 P2 is 1.07 times faster than P1 Clock Rate CPI Class A Class B Class C Class D Class E a P1 2.0GHz 1 2 3 4 P2 4.0GHz b 3.0GHz

1.5.3 If the number of instructions executed in a certain program is divided equally among the classes of instructions except for class E, which occurs twice as often as each of the others, which computer is faster? How much faster is it? Solution: a. T(P2)/T(P1) = 4.5/8 P2 is 1.77 times faster than P1 b. T(P2)/T(P1) = 5.33/5.5 P2 is 1.03 times faster than P1 Clock Rate CPI Class A Class B Class C Class D Class E a P1 2.0GHz 1 2 3 4 P2 4.0GHz b 3.0GHz

The table below shows instruction-type breakdown for different programs. Using this data, you will be exploring the performance trade-offs for different changes made to an MIPS process. No. Instructions Compute Load Store Branch Total a program1 600 200 50 1450 b program2 900 500 100 1700

1.5.4 Assuming that ALU take 1 cycle, loads and store instructions take 10 cycles, and branches take 3 cycles, find the execution time on a 3GHz MIPS processor. Solution: a. (600+600*10+200*10+50*3)/3*10^9=2.91 μs b. 2.50 μs No. Instructions Compute Load Store Branch Total a program1 600 200 50 1450 b program2 900 500 100 1700

1.5.5 Assuming that computers take 1 cycle, loads and store instructions take 2 cycles, and branches take 3 cycles, find the execution time on a 3GHz MIPS processor. Solution: a. 0.78 μs b. 0.90 μs No. Instructions Compute Load Store Branch Total a program1 600 200 50 1450 b program2 900 500 100 1700

1.5.6 Assuming that computers take 1 cycle, loads and store instructions take 2 cycles, and branches take 3 cycles, what is the speedup if the number of compute instruction can be reduced by one-half ? Solution: a. 0.78 μs b. 0.90 μs No. Instructions Compute Load Store Branch Total a program1 600 200 50 1450 b program2 900 500 100 1700

Exercise 1.6 Compilers can have a profound impact on the performance of an application on given a processor. This problem will explore the impact compilers have on execution time. Compiler A Compiler B No. Instructions Execution Time a 1.00E+09 1.8s 1.20E+09 b 1.1s 1.5s

CPI = Texec × f / No. Instr=1.8s/1 ns /1.00E+09 1.6.1 For the same program, two different compilers are used. The table above shows the execution time of the two different compiled programs. Find the average CPI for each program given that the processor has a clock cycle time of 1 ns. Solution: CPI = Texec × f / No. Instr=1.8s/1 ns /1.00E+09 a. CPI(Compiler A)=1.8; CPI(Compiler B)=1.5. b. CPI(Compiler A)=1.1; CPI(Compiler B)=1.25. Compiler A Compiler B No. Instructions Execution Time a 1.00E+09 1.8s 1.20E+09 b 1.1s 1.5s

fA/fB = (No. Instr(A)× CPI(A))/(No. Instr(B)×CPI(B)) 1.6.2 Assume the average CPIs found in 1.6.1, but that the compiled programs run on two different processors. If the execution times on the two processors are the same, how much faster is the clock of the processor running compiler A’s code versus the clock of the processor running compiler B’s code? Solution: fA/fB = (No. Instr(A)× CPI(A))/(No. Instr(B)×CPI(B)) a. fA/fB = (1*1.8)/(1.5*1.2)=1 b. fA/fB =0.73 Compiler A Compiler B No. Instructions Execution Time a 1.00E+09 1.8s 1.20E+09 b 1.1s 1.5s

a. Tnew/TA = 0.6*1.1/1*1.8=0.36 Tnew/TB = 0.36 1.6.3 A new compiler is developed that uses only 600 million instructions and has an average CPI of 1.1. What is the speedup of using this new compiler versus using Compiler A or B on the original processor of 1.6.1? Solution: a. Tnew/TA = 0.6*1.1/1*1.8=0.36 Tnew/TB = 0.36 b. Tnew/TA = 0.6 Tnew/TB = 0.44 Compiler A Compiler B No. Instructions Execution Time a 1.00E+09 1.8s 1.20E+09 b 1.1s 1.5s

Consider two different implementations, P1 and P2, of the same instruction set. There are five classes of instructions(A,B,C,D, and E) in the instruction set. P1 has a clock rate of 4GHz, and P2 has clock rate of 6GHz. The average number of cycles for each instruction class for P1 and P2 are listed in the following table. CPI Class A Class B Class C Class D Class E a P1 1 2 3 4 5 P2 b 6

1.6.4 Assume that peak performance is defined as the fastest rate that a computer can execute any instruction sequence. What are the peak performances of P1 and P2 expressed in instructions per second? Solution: a. 4 × 109 Inst/s 2 × 109 Inst/s b. 4 × 109 Inst/s 3 × 109 Inst/s CPI Class A Class B Class C Class D Class E a P1 1 2 3 4 5 P2 b 6

1.6.5 If the number of instructions executed in a certain program is divided equally among the five classes of instructions except for class A, which occurs twice as often as each of the others, how much faster is P2 than P1? Solution: a. T1/T2 = 1.9 b. T1/T2 = 1.5 CPI Class A Class B Class C Class D Class E a P1 1 2 3 4 5 P2 b 6

1.6.6 At what frequency does P1 have the same performance of P2 for the instruction mix given in 1.6.5 ? Solution: a. 4.37 GHz b. 6 GHz CPI Class A Class B Class C Class D Class E a P1 1 2 3 4 5 P2 b 6

Exercise 1.14 Section 1.8 cites as a pitfall the utilization of a subset of the performance equation as a performance metric. To illustrate this, consider the following data for the execution of a program in different processors. Processor Clock Rate CPI No. Instr. a P1 4 GHz 0.9 5.00E+06 P2 3 GHz 0.75 1.00E+06 b 1.1 3.00E+06 2.5 GHz 0.50E+06

1.14.1 One usual fallacy is to consider the computer with the largest clock rate as having the largest performance. Check if this is true for P1 and P2. Solution: No. instr = 106 a. T(P1) = 5 × 106 × 0.9/(4 × 109) = 1.125 × 10–3 s T(P2) = 106 × 0.75/(3 × 109) = 0.25 × 10–3 s clock rate (P1) > clock rate (P2) performance (P1) < performance (P2) Processor Clock Rate CPI No. Instr. a P1 4 GHz 0.9 5.00E+06 P2 3 GHz 0.75 1.00E+06 b 1.1 3.00E+06 2.5 GHz 0.50E+06

1.14.1 One usual fallacy is to consider the computer with the largest clock rate as having the largest performance. Check if this is true for P1 and P2. Solution: No. instr = 106 b. T(P1) = 3 × 106 × 1.1/(3 × 109) = 1.1 × 10–3 s T(P2) = 0.5 × 106 × 1/(2.5 × 109) = 0.2 × 10–3 s clock rate (P1) > clock rate (P2) performance (P1) < performance (P2) Processor Clock Rate CPI No. Instr. a P1 4 GHz 0.9 5.00E+06 P2 3 GHz 0.75 1.00E+06 b 1.1 3.00E+06 2.5 GHz 0.50E+06

1.14.2 Another fallacy is to consider that the processor executing the largest number of instructions will need a larger CPU time. Considering that processor P1 is executing a sequence of 106 instructions and that the CPI of processors P1 and P2 do not change, determine the number of instructions that P2 can execute in the same time that P1 needs to execute 106 instructions. Solution: a. 106 instructions, T(P1) = No. Intr × CPI/clock rate T(P1) = 2.25 × 10–4 s T(P2) = N × 0.75/(3 × 109) then N = 9 × 105 Processor Clock Rate CPI No. Instr. a P1 4 GHz 0.9 5.00E+06 P2 3 GHz 0.75 1.00E+06 b 1.1 3.00E+06 2.5 GHz 0.50E+06

1.14.2 Another fallacy is to consider that the processor executing the largest number of instructions will need a larger CPU time. Considering that processor P1 is executing a sequence of 106 instructions and that the CPI of processors P1 and P2 do not change, determine the number of instructions that P2 can execute in the same time that P1 needs to execute 106 instructions. Solution: b. 106 instructions, T(P1) = No. Intr × CPI/clock rate T(P1) = 3.66 × 10–4 s T(P2) = N × 1/(3 × 109) then N = 9.15 × 105 Processor Clock Rate CPI No. Instr. a P1 4 GHz 0.9 5.00E+06 P2 3 GHz 0.75 1.00E+06 b 1.1 3.00E+06 2.5 GHz 0.50E+06

1.14.3 A common fallacy is to use MIPS(millions of instructions per second) to compare the performance of two different processors, and consider that the processor with the largest MIPS has the largest performance. Check if this is true for P1 and P2. Solution: MIPS = Clock rate × 10−6/CPI a. MIPS(P1) = 4 × 109 × 10–6/0.9 = 4.44 × 103 MIPS(P2) = 3 × 109 × 10–6/0.75 = 4.0 × 103 MIPS(P1) > MIPS(P2) performance(P1) < performance(P2) (from 1.14.1) Processor Clock Rate CPI No. Instr. a P1 4 GHz 0.9 5.00E+06 P2 3 GHz 0.75 1.00E+06 b 1.1 3.00E+06 2.5 GHz 0.50E+06

1.14.3 A common fallacy is to use MIPS(millions of instructions per second) to compare the performance of two different processors, and consider that the processor with the largest MIPS has the largest performance. Check if this is true for P1 and P2. Solution: MIPS = Clock rate × 10−6/CPI b. MIPS(P1) = 3 × 109 × 10–6/1.1 = 2.72 × 103 MIPS(P2) = 2.5 × 109 × 10–6/1 = 2.5 × 103 MIPS(P1) > MIPS(P2) performance(P1) < performance(P2) (from 1.14.1) Processor Clock Rate CPI No. Instr. a P1 4 GHz 0.9 5.00E+06 P2 3 GHz 0.75 1.00E+06 b 1.1 3.00E+06 2.5 GHz 0.50E+06

Another common performance figure is MFLOPS(millions of floating-point operations per second), defined as MFLOPS = No.FP operations / (execution time × 106) but this figure has the same problems as MIPS. Consider the program in the following table, running on the two processors below. P Instr. Count No. instructions CPI Clock Rate L/S FP Branch a P1 1.00E+06 50% 40% 10% 0.75 1.0 1.5 4 GHz P2 5.00E+06 20% 1.25 0.8 3 GHz b 30% 2.0 2.00E+06 2.5

1.14.4 Find the MFLOPS figures for the programs. MFLOPS = No. FP operations × 10−6/T a: T(P1) = (5 × 105 × 0.75 + 4 × 105 × 1 + 10 × 105 × 1.5)/(4 × 109) = 5.86 × 10–4 s MFLOPS(P1) = 4 × 105 × 10–6/(5.86 × 10–4 ) = 6.82 × 102 T(P2) = (2 × 106 × 1.25 + 2 × 106 × 0.8 + 1 × 106 × 1.25)/(3 × 109) = 1.78 × 10–3 s MFLOPS(P2) = 3 × 105 × 10–6/(1.78 × 10–3) = 1.68 × 102 P Instr. Count No. instructions CPI Clock Rate L/S FP Branch a P1 1.00E+06 50% 40% 10% 0.75 1.0 1.5 4 GHz P2 5.00E+06 20% 1.25 0.8 3 GHz b 30% 2.0 2.00E+06 2.5

1.14.4 Find the MFLOPS figures for the programs. MFLOPS = No. FP operations × 10−6/T b: T(P1) = (1.5 × 106 × 1.5 + 1.5 × 106 × 1 + 2 × 106 × 2)/(4 × 109) = 1.93 × 10–3 s MFLOPS(P1) = 1.5 × 106 × 10–6/(1.93 × 10–3) = 7.7 × 102 T(P2) = (0.8 × 106 × 1.25 + 0.6 × 106 × 1 + 0.6 × 106 × 2.5)/(3 × 109) = 1.03 × 10–3 s MFLOPS(P2) = 0.6 × 106 × 10–6/(1.03 × 10–3) = 5.82 × 102 P Instr. Count No. instructions CPI Clock Rate L/S FP Branch a P1 1.00E+06 50% 40% 10% 0.75 1.0 1.5 4 GHz P2 5.00E+06 20% 1.25 0.8 3 GHz b 30% 2.0 2.00E+06 2.5

1.14.5 Find the MIPS figures for the programs. a: T(P1) = (5 × 105 × 0.75 + 4 × 105 × 1 + 10 × 105 × 1.5)/(4 × 109) = 5.86 × 10–4 (s) CPI(P1) = 5.86 × 10–4 × 4 × 109/106 = 2.27 MIPS(P1) = 4 × 109/(2.27 ×106) = 1.76 × 103 T(P2) = (2 × 106 × 1.25 + 2 × 106 × 0.8 + 1 × 106 × 1.25)/(3 × 109) = 1.78 × 10–3 (s) CPI(P2) = 1.78 × 10–3 × 3 × 109/(5 × 106) = 1.068 (s) MIPS(P2) = 3 × 109/(1.068 × 106) = 2.78 × 103 P Instr. Count No. instructions CPI Clock Rate L/S FP Branch a P1 1.00E+06 50% 40% 10% 0.75 1.0 1.5 4 GHz P2 5.00E+06 20% 1.25 0.8 3 GHz b 30% 2.0 2.00E+06 2.5

1.14.5 Find the MIPS figures for the programs. b. T(P1) = (1.5 × 106 × 1.5 + 1.5 × 106 × 1 + 2 × 106 × 2)/(4 × 109) = 1.93 × 10–3 (s) CPI(P1) = 1.93 × 10–3 × 4 × 109/(5 × 106) = 1.54 MIPS(P1) = 4 × 109/(1.54 × 106) = 2.59 × 103 T(P2) = (0.8 × 106 × 1.25 + 0.6 × 106 × 1 + 0.6 × 106 × 2.5)/(3 × 109) = 1.03 × 10–3 (s) CPI(P2) = 1.03 × 10–3 × 3 × 109/(2 ×106) = 1.54 MIPS(P1) = 3 × 109/(1.54 × 106) = 1.94 × 103 P Instr. Count No. instructions CPI Clock Rate L/S FP Branch a P1 1.00E+06 50% 40% 10% 0.75 1.0 1.5 4 GHz P2 5.00E+06 20% 1.25 0.8 3 GHz b 30% 2.0 2.00E+06 2.5

1.14.6 Find the performance for the programs and compare it with MIPS ans MFLOPS. T(P1) = 5.86 × 10–4 s (see problem 1.14.5) performance(P1) = 1/T(P1) = 1.7 × 103 T(P2) = 1.78 × 10–3 s s (see problem 1.14.5) performance(P2) = 1/T(P2) = 5.6 × 102 perf(P1) > perf(P2), MIPS(P1) > MIPS(P2), MFLOPS(P1) < MFLOPS(P2) P Instr. Count No. instructions CPI Clock Rate L/S FP Branch a P1 1.00E+06 50% 40% 10% 0.75 1.0 1.5 4 GHz P2 5.00E+06 20% 1.25 0.8 3 GHz b 30% 2.0 2.00E+06 2.5

1.14.6 Find the performance for the programs and compare it with MIPS ans MFLOPS. b: T(P1) = 1.93 × 10–3 s s (see problem 1.14.5) performance(P1) = 1/T(P1) = 5.1 × 102 T(P2) = 1.03 × 10–3 s s (see problem 1.14.5) performance(P2) = 1/T(P2) = 9.7 × 102 perf(P1) < perf(P2), MIPS(P1) < MIPS(P2), MFLOPS(P1) > MFLOPS(P2) P Instr. Count No. instructions CPI Clock Rate L/S FP Branch a P1 1.00E+06 50% 40% 10% 0.75 1.0 1.5 4 GHz P2 5.00E+06 20% 1.25 0.8 3 GHz b 30% 2.0 2.00E+06 2.5

Exercise 1.15 Another pitfall cited in Section 1.8 is expecting to improve the overall performance of a computer by improving only one aspect of the computer. This might be true, but not always. Consider a computer running programs with CPU times shown in the following table. FP Instr. INT Instr. L/S Instr. Branch Instr. Total Time a 70s 85s 55s 40s 250s b 90s 60s 20s 210s

1.15.1 How much is the total time reduced if the time for FP operations is reduced by 20%? Solution: a. Tfp = 70 × 0.8 = 56 s. Tnew= 56 + 85 + 55 + 40 = 236 s. Reduction: 5.6% b. Tfp = 40 × 0.8 = 32 s. Tnew= 32 + 90 + 60 + 20 = 202 s. Reduction: 3.8% FP Instr. INT Instr. L/S Instr. Branch Instr. Total Time a 70s 85s 55s 40s 250s b 90s 60s 20s 210s

Tfp + Tl/s + Tbranch = 165 s, Tint = 35 s Reduction time INT: 58.8% 1.15.2 How much is the time for INT operations reduced if the total time is reduced by 20%? Solution: a. Tnew = 250 × 0.8 = 200 s Tfp + Tl/s + Tbranch = 165 s, Tint = 35 s Reduction time INT: 58.8% b. Tnew = 210 × 0.8 = 168 s Tfp + Tl/s + Tbranch = 120 s, Tint = 48 s Reduction time INT: 46.6% FP Instr. INT Instr. L/S Instr. Branch Instr. Total Time a 70s 85s 55s 40s 250s b 90s 60s 20s 210s

1.15.3 Can the total time be reduced by 20%by reducing only the time for branch instructions? Solution: a. Tnew = 250 × 0.8 = 200 s Tfp + Tint + Tl/s = 210 s NO b. Tnew = 210 × 0.8 = 168 s Tfp + Tint + Tl/s = 190 s FP Instr. INT Instr. L/S Instr. Branch Instr. Total Time a 70s 85s 55s 40s 250s b 90s 60s 20s 210s

Assume that each processor has a 2 GHz clock rate. The following table shows the instruction type breakdown per processor of given applications executed in different numbers of processors. Assume that each processor has a 2 GHz clock rate. P FP Instr. INT L/S Branch CPI (FP) (INT) (L/S) (Branch) a 2 280×106 1000×106 640×106 128×106 1 4 b 16 50×106 110×106 80×106 16×106

1.15.4 How much must we improve the CPI of FP instructions if we want the program to run two times faster? Solution: Clock cyles = CPIfp × No. FP instr. + CPIint × No. INT instr. + CPIl/s × No. L/S instr. + CPIbranch × No. branch instr. Tcpu = clock cycles/clock rate = clock cycles/2 × 109 a. 2 processors: clock cycles = 4,096 × 106 Tcpu = 2.048 s b. 16 processors: clock cycles = 512 × 106 Tcpu = 0.256 s P FP Instr. INT L/S Branch CPI (FP) (INT) (L/S) (Branch) a 2 280×106 1000×106 640×106 128×106 1 4 b 16 50×106 110×106 80×106 16×106

1.15.4 How much must we improve the CPI of FP instructions if we want the program to run two times faster? Solution: To half the number of clock cycles by improving the CPI of FP instructions: CPIimproved fp × No. FP instr. + CPIint × No. INT instr. + CPIl/s × No. L/S instr. +CPIbranch × No. branch instr. = clock cycles/2 CPIimproved fp = (clock cycles/2 − (CPIint × No. INT instr. + CPIl/s × No. L/S instr. + CPIbranch × No. branch instr.))/No. FP instr.

1.15.4 How much must we improve the CPI of FP instructions if we want the program to run two times faster? Solution: a. 2 processors: CPIimproved fp = (2,048 – 3,816)/280 < 0 ==> not possible b. 16 processors: CPIimproved fp = (256 – 462)/50 < 0 ==> not possible

1.15.5 How much must we improve the CPI of L/S instructions if we want the program to run two times faster? Solution: Using the clock cycle data from 1.15.4: To half the number of clock cycles improving the CPI of L/S instructions: CPIfp × No. FP instr. + CPIint × No. INT instr. + CPIimproved l/s × No. L/S instr. +CPIbranch × No. branch instr. = clock cycles/2 CPIimproved l/s = (clock cycles/2 − (CPIfp × No. FP instr. + CPIint × No. INT instr. + CPIbranch × No. branch instr.))/No. L/S instr.

1.15.5 How much must we improve the CPI of L/S instructions if we want the program to run two times faster? Solution: a. 2 processors: CPIimproved l/s = (2,048 – 1,536)/640 = 0.8 b. 16 processors: CPIimproved l/s = (256 – 198)/80 = 0.725

1.15.6 How much is the execution time of the program improved if the CPI of INT and FP instructions is reduced by 40% and the CPI of L/S and Branch is reduced by 30%? Solution: clock cyles = CPIfp × No. FP instr. + CPIint × No. INT instr. + CPIl/s × No. L/S instr. + CPIbranch × No. branch instr. Tcpu = clock cycles/clock rate = clock cycles/2 × 109 CPIint = 0.6 × 1 = 0.6; CPIfp = 0.6 × 1 = 0.6; CPIl/s = 0.7 × 4 = 2.8; CPIbranch = 0.7 × 2 = 1.4 2 processors: Tcpu (before improv.) = 2.048 s; Tcpu (after improv.) = 1.370 s; 16processors: Tcpu (before improv.) = 0.256 s; Tcpu (after improv.) = 0.171 s。