Presentation is loading. Please wait.

Presentation is loading. Please wait.

Breaking up is hard to do….

Similar presentations


Presentation on theme: "Breaking up is hard to do…."— Presentation transcript:

1 Breaking up is hard to do….
Reading 4.9 just p (stop at pipelined implementation) MC - Not in the textbook – we’ll get into some detail in lecture and suggested hw problems Multi-cycle CPU Breaking up is hard to do…. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

2 Single-Cycle CPU Summary
Easy, particularly the control Which instruction takes the longest? By how much? Why is that a problem? ET = IC * CPI * CT What else can we do? When does a multi-cycle implementation make sense? e.g., 70% of instructions take 75 ns, 30% take 200 ns? suppose 20% overhead for extra latches Real machines have much more variable instruction latencies than this. The main points I want them to get: cpi always equals 1.0, but the cycle time is going to be long (and determined by the longest instruction). performance is therefore a function of the longest instruction in the ISA In a multicycle implementation, where some insts take 3 cycles, some 4, some 5, performance is a function of the average length instead of the longest. calculate an estimated speedup for the example here. 200 vs. (200*.3+75*.7)*1.2 (60+50)*1.2 ~ 135

3 You’ve been walking through history
Someone needed to run a program

4 You’ve been walking through history
Someone needed to run a program Simple instructions were designed for very simple hardware (limited transistors)

5 You’ve been walking through history
Someone needed to run a program Simple instructions were designed for very simple hardware (limited transistors) Someone wants to run a new program, but not create all new hardware

6 You’ve been walking through history
Someone needed to run a program Simple instructions were designed for very simple hardware (limited transistors) Someone wants to run a new program, but not create all new hardware More instructions added LAB!

7 You’ve been walking through history
Someone needed to run a program Simple instructions were designed for very simple hardware (limited transistors) Someone wants to run a new program, but not create all new hardware More instructions added More transistors enable more complex hardware More complex instructions are desired as instruction memory is limited and costly The story continues..

8 Why a Multiple Clock Cycle CPU?
the problem => single-cycle cpu has a cycle time long enough to complete the longest instruction in the machine the solution => break up execution into smaller tasks, each task taking a cycle, different instructions requiring different numbers of cycles or tasks other advantages => reuse of functional units (e.g., alu, memory) ET = IC * CPI * CT

9 Breaking Execution Into Clock Cycles
We will have five execution steps (not all instructions use all five) fetch decode & register fetch execute memory access write-back

10 Single Cycle vs. Multi-cycle
CPI CT Single Cycle Multi-cycle lw sw add r-type Draw stages and how they get cutup

11 Cutting up Single Cycle
Draw how we’d most logically cut this up Then point out wait – if I cut the cycle time, how do I keep What I’ve done?

12 Breaking Execution Into Clock Cycles
Introduces extra registers when: signal is computed in one clock cycle and used in another, AND the inputs to the functional block that outputs this signal can change before the signal is written into a state element. Significantly complicates control. Why? The goal is to balance the amount of work done each cycle.

13 Multicycle datapath Intermediate latches. One ALU
One memory (give hint about self-modifying code)

14 Multicycle datapath – Load word
Load word, write RTL below per cycle Multicycle datapath – Load word

15 Summary of execution steps
Talk through each – esp. the early branch computation Summary of execution steps We can use Register-Transfer-Language (RTL) to describe these steps

16 Why are the first two stages always the same (best answer)?
Peer instruction Why are the first Two the same? Selection Why are the first two stages always the same (best answer)? A All instructions do the same thing at the start B The instruction is not determined until after the 2nd cycle C To decrease the complexity of the control logic D Trick question – they aren’t always the same E None of the above

17 Complete Multicycle Datapath
(don’t be intimidated – it all makes sense…)

18 Complete Multicycle Datapath
R-type – 1st cycle Draw active path

19 Complete Multicycle Datapath
R-type – 2nd cycle

20 Complete Multicycle Datapath
R-type –3rd cycle

21 Complete Multicycle Datapath
R-type – 4th cycle

22 Which inst. does PCWrite stuck at 1 break? Lw R-type Beq Both A & B A,B,&C

23 Multicycle Control Single-cycle control used combinational logic
Multi-cycle control uses a Finite State Machine. FSM defines a succession of states, transitions between states (based on inputs), and outputs (based on state) First two states same for every instruction, next state depends on opcode

24

25 Isomorphic IF = 200ps ID = 50ps EX = 100ps M = 200ps WB = 50ps
Breaking a single cycle processor into stages, hardware engineers determine these to be the execution time per stage. The code below is the most commonly executed code by the company. Loop: lw r1, 0 (r2) add r2, r3, r4 sub r5, r1, r2 beq r5, $zero IF = 200ps ID = 50ps EX = 100ps M = 200ps WB = 50ps Isomorphic Your boss is interested in changing to the MIPS multi-cycle processor. He asks you whether or not this would be a good idea. You say? Selection Good idea? Reason A Yes CPI stays the same. CT decreases (factor of 4) B CPI increases (factor of 4). CT decreases (factor of 5) C No CPI increases (factor of 4). CT decreases (factor of 3) D CPI decreases (factor of 5 ). CT increases (factor of 5) E CPI stays the same. CT stays the same. Complexity increases. Tails - flipped Correct Answer - C

26 IF = 200ps ID = 200ps EX = 200ps M = 200ps WB = 200ps
Breaking a single cycle processor into stages, hardware engineers determine these to be the execution time per stage. The code below is the most commonly executed code by the company. Loop: lw r1, 0 (r2) add r2, r3, r4 sub r5, r1, r2 beq r5, $zero IF = 200ps ID = 200ps EX = 200ps M = 200ps WB = 200ps Your boss is interested in changing to the MIPS multi-cycle processor. He asks you whether or not this would be a good idea. You say? Selection Good idea? Reason A Yes CPI stays the same. CT decreases (factor of 4) B CPI increases (factor of 4). CT decreases (factor of 5) C No CPI increases (factor of 4). CT decreases (factor of 3) D CPI decreases (factor of 5 ). CT increases (factor of 5) E CPI stays the same. CT stays the same. Complexity increases. Correct Answer - B Primary Purpose: Having students can calculate MC and single cycle CPI and CT. Concept: Help students recognize the tradeoffs inherent in MC. If you can decrease CT by enough, you can afford the increase in CPI. Expected mistakes: Not understanding how to computer CT differences or CPI differences. Post Discussion: Calculate CPI and CT explicitly in each case. Then bring together the discussion by noting the importance of a balance design.

27 Balanced cycles explanation
Draw single-cycle wasted time Draw multi-cycle potential wasted time (200,50,100,200,50)

28 Multi-cycle Questions
How many cycles will it take to execute this code? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label #assume not taken add $t5, $t2, $t3 sw $t5, 8($t3) Label: ... Selection Number of Cycles A 5 B 21 C 22 D 25 E None of the above

29 Multi-cycle Questions
What is going on in cycle 8? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label #assume not taken add $t5, $t2, $t3 sw $t5, 8($t3) Label: ... Selection Number of Cycles A PC=PC+4; IR=M[pc] B A=R[t3]; B=R[t3] C ALUOut=R[t3]+4 D R[t3]=M[ALUOut] E None of the above

30 Your coworker thinks you are crazy. You reply?
Suppose you work on an embedded multi-cycle MIPS processor and your software team tells you that every program which executes has to go through memory and zero 1k bytes of data fairly often (averages 10% of ET). You realize you could just have a single instruction do this called zero1k (rs) which does: M[rs] = 0 … M[rs+1020] = 0. Your coworker thinks you are crazy. You reply? Remember to ask about single-cycle Answer - D Selection Crazy? Reason A Yes The complexity of such an instruction combined with no performance gain is silly. B The complexity of such an instruction combined with minimal performance gain (<5%) is silly. C No The minimal performance gains (<5%) rationalize this simple instruction. D The significant performance gains (>5%) rationalize this complex instruction. E Maybe None of the above. Correct Answer - D

31 Show code, then cycle analysis.

32 Finite State Machine for Control
Implementation:

33 ROM Implementation ROM = "Read Only Memory"
values of memory locations are fixed ahead of time A ROM can be used to implement a truth table if the address is m-bits, we can address 2m entries in the ROM. our outputs are the bits of data that the address points to m is the "height", and n is the "width" m n

34 ROM Implementation How many inputs are there? 6 bits for opcode, 4 bits for state = 10 address lines (i.e., 210 = 1024 different addresses) How many outputs are there? 16 datapath-control outputs, 4 state bits = 20 outputs ROM is 210 x 20 = 20K bits (and a rather unusual size) Rather wasteful, since for lots of the entries, the outputs are the same — i.e., opcode is often ignored

35 Multicycle CPU Key Points
Performance gain achieved from variable-length instructions ET = IC * CPI * cycle time Required very few new state elements More, and more complex, control signals Control requires FSM


Download ppt "Breaking up is hard to do…."

Similar presentations


Ads by Google