Download presentation
Presentation is loading. Please wait.
1
11/1/2005Comp 120 Fall 20051 1 November Exam Postmortem 11 Classes to go! Read Sections 7.1 and 7.2 You read 6.1 for this time. Right? Pipelining then on to Memory hierarchy
2
11/1/2005Comp 120 Fall 20052 1 [6] Give as least 3 different formulas for execution time using the following variables. Each equation should be minimal; that is, it should not contain any variable that is not needed. CPI, R=clock rate, T=cycle time, M=MIPS, I=number of instructions in program, C=number of cycles in program. CPI*I*T, CPI*I/R, C*T, C/R, I/(MIPS*10^6)
3
11/1/2005Comp 120 Fall 20053 2 [12] Soon we will all have computers with multiple processors. This will allow us to run some kinds of programs faster. Suppose your computer has N processors and that a program you want to make faster has some fraction F of its computation that is inherently sequential (either all the processors have to do the same work thus duplicating effort, or the work has to be done by a single processor while the other processors are idle). What speedup can you expect to achieve on your new computer? 1/(F + (1-F)/N)
4
11/1/2005Comp 120 Fall 20054 3 [10] Show the minimum sequence of instructions required to sum up an array of 5 integers, stored as consecutive words in memory, given the address of the first word in $a0; leaving the sum in register $v0? 5 loads + 4 adds = 9 instructions
5
11/1/2005Comp 120 Fall 20055 4 [10] Which of the following cannot be EXACTLY represented by an IEEE double-precision floating-point number? (a) 0, (b) 10.2, (c) 1.625, (d) 11.5
6
11/1/2005Comp 120 Fall 20056 5 [12] All of the following equations are true for the idealized numbers you studied in algebra. Which ones are true for IEEE floating point numbers? Assume that all of the numbers are well within the range of largest and smallest possible numbers (that is, underflow and overflow are not a problem) –A+B=A if and only if B=0 –(A+B)+C = A+(B+C) –A+B = B+A –A*(B+C) = A*B+A*C
7
11/1/2005Comp 120 Fall 20057 6
8
11/1/2005Comp 120 Fall 20058 7 [8] Draw a block diagram showing how to implement 2’s complement subtraction of 3 bit numbers (A minus B) using 1-bit full-adder blocks with inputs A, B, and Cin and outputs Sum and Cout and any other AND, OR, or INVERT blocks you may need.
9
11/1/2005Comp 120 Fall 20059 8 [10] Assume that multiply instructions take 12 cycles and account for 10% of the instructions in a typical program and that the other 90% of the instructions require an average of 4 cycles for each instruction. What percentage of time does the CPU spend doing multiplication? 12*0.1/(12*0.1+4*0.9)=25%
10
11/1/2005Comp 120 Fall 200510 9 [10] In a certain set of benchmark programs about every 4th instruction is a load instruction that fetches data from main memory. The time required for a load is 50ns. The CPI for all other instructions is 4. Assuming the ISA’s are the same, how much faster will the benchmarks run with a 1GHz clock than with a 500MHz clock? For 1GHz: 50 + 12 = 62 cycles and 62 nanoseconds For 500MHz: 25 + 12 = 37 cycles and 74 nanoseconds Speedup is 74 / 62 = 1.19
11
11/1/2005Comp 120 Fall 200511 10 [12] Explain why floating point addition and subtraction may result in loss of precision but multiplication and division do not. Before we can add we have to adjust the binary point so that the exponents are equal. This can cause many bits to be lost if the exponents are very different. Multiplication doesn’t have this problem. All the bits play equally.
12
11/1/2005Comp 120 Fall 200512 Results Average: 72.5 Median: 81
13
11/1/2005Comp 120 Fall 200513 Chapter 6 Pipelining
14
11/1/2005Comp 120 Fall 200514 Doing Laundry
15
11/1/2005Comp 120 Fall 200515 Pipelining Improve performance by increasing instruction throughput Ideal speedup is number of stages in the pipeline. Do we achieve this?
16
11/1/2005Comp 120 Fall 200516 We have 5 stages. What needs to be controlled in each stage? –Instruction Fetch and PC Increment –Instruction Decode / Register Fetch –Execution –Memory Stage –Register Write Back How would control be handled in an automobile plant? –a fancy control center telling everyone what to do? –should we use a finite state machine? Pipeline control
17
11/1/2005Comp 120 Fall 200517 Pipelining What makes it easy –all instructions are the same length –just a few instruction formats –memory operands appear only in loads and stores What makes it hard? –structural hazards: suppose we had only one memory –control hazards: need to worry about branch instructions –data hazards: an instruction depends on a previous instruction Individual Instructions still take the same number of cycles But we’ve improved the through-put by increasing the number of simultaneously executing instructions
18
11/1/2005Comp 120 Fall 200518 Structural Hazards Inst Fetch Reg Read ALUData Access Reg Write Inst Fetch Reg Read ALUData Access Reg Write Inst Fetch Reg Read ALUData Access Reg Write Inst Fetch Reg Read ALUData Access Reg Write
19
11/1/2005Comp 120 Fall 200519 Problem with starting next instruction before first is finished –dependencies that “go backward in time” are data hazards Data Hazards
20
11/1/2005Comp 120 Fall 200520 Have compiler guarantee no hazards Where do we insert the “nops” ? sub$2, $1, $3 and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2) Problem: this really slows us down! Software Solution
21
11/1/2005Comp 120 Fall 200521 Use temporary results, don’t wait for them to be written –register file forwarding to handle read/write to same register –ALU forwarding Forwarding
22
11/1/2005Comp 120 Fall 200522 Load word can still cause a hazard: –an instruction tries to read a register following a load instruction that writes to the same register. Thus, we need a hazard detection unit to “stall” the instruction Can't always forward
23
11/1/2005Comp 120 Fall 200523 Stalling We can stall the pipeline by keeping an instruction in the same stage
24
11/1/2005Comp 120 Fall 200524 When we decide to branch, other instructions are in the pipeline! We are predicting “branch not taken” –need to add hardware for flushing instructions if we are wrong Branch Hazards
25
11/1/2005Comp 120 Fall 200525 Improving Performance Try to avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) Add a “branch delay slot” –the next instruction after a branch is always executed –rely on compiler to “fill” the slot with something useful Superscalar: start more than one instruction in the same cycle
26
11/1/2005Comp 120 Fall 200526 Dynamic Scheduling The hardware performs the “scheduling” –hardware tries to find instructions to execute –out of order execution is possible –speculative execution and dynamic branch prediction All modern processors are very complicated –Pentium 4: 20 stage pipeline, 6 simultaneous instructions –PowerPC and Pentium: branch history table –Compiler technology important
27
11/1/2005Comp 120 Fall 200527 Chapter 7 Preview Memory Hierarchy
28
11/1/2005Comp 120 Fall 200528 Memory Hierarchy Memory devices come in several different flavors –SRAM – Static Ram fast (1 to 10ns) expensive (>10 times DRAM) small capacity (< ¼ DRAM) –DRAM – Dynamic RAM 16 times slower than SRAM (50ns – 100ns) Access time varies with address Affordable ($160 / gigabyte) 1 Gig considered big –DISK Slow! (10ms access time) Cheap! (< $1 / gigabyte) Big! (1Tbyte is no problem)
29
11/1/2005Comp 120 Fall 200529 Users want large and fast memories! Try to give it to them –build a memory hierarchy Memory Hierarchy
30
11/1/2005Comp 120 Fall 200530 Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be referenced soon. Why does code have locality?
31
11/1/2005Comp 120 Fall 200531 10
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.