1 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. The single cycle CPU
2 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Performance of Single-Cycle Machines Memory Unit 2 ns ALU and Adders 2 ns Register file (Read or Write) 1 ns Class Fetch Decode ALU Memory Write Back Total R-format LW SW ns Branch ns Jump 2 2ns
3 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. What if we had a variable CK cycle? Let’s check the following scenario: Rtype: 44%, LW: 24%, SW: 12% BRANCH: 18%, JUMP: 2% I- number of instructions in program T- time of the CK cycle CPI - number of CK cycle per instruction (=1) Execution=I*T*CPI= 8*24%+7*12%+6*44%+5*18%+2*2%=6.3 ns
4 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. The result: EXE Single cycle T single clock * I T single clock 8 EXE Variable T variable clock * I T variable clock 6.3 We get a ratio of The ratio is higher when more complicated instructions, e.g., floating point instructions are also implemented. Since building a variable CK circuit is too complicated, we instead want instructions to take as many shorter CKs as required
5 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Multicycle Approach The idea of Multi-cycle approach: We’ll save time since each instruction takes only the necessary number of CK cycles (which are about 5 times shorter than the original CK cycle) We also save in components since we can use the same component in different phases of the same instruction
6 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Building a Multi-Cycle CPU: Split the instruction to steps (phases) Make sure that the steps are balanced (same time required) Reduce the job done at each step. In each step only one chore is done. At the end of each CK cycle: Store the result of the current step to be used by the next step. So, add more internal registers for storing the intermediate results.
7 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A single cycle CPU capable of R-type & lw/sw instructions (data & control) 5 [25:21]=Rs 5 [20:16]=Rt Reg File Instruction Memory PCALU Adde r 4 ck 6 [31:26] RegWrite 16 [15:0] 5 add Sext 16->32 Data Memory 5 [25:21]=Rs 6 [5:0]=funct ALU control Rd Address D.In D. Out MemWrite
8 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A single cycle CPU capable of R-type & lw/sw instructions - Data Path only 5 [25:21]=Rs 5 [20:16]=Rt Reg File Instruction Memory PCALU Adde r 4 ck 16 [15:0] 5 Sext 16->32 Data Memory 5 [25:21]=Rs Rd Address D.In D. Out lw sw
9 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Timing of a single cycle CPU
10 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. PC D. Mem data D.Mem adrs 0x Rs, RtALU inputs ALU output (address) Memory output fetch Write backdecode execute Mem data memory I.Mem data PC IR A,B ALUout Mem data MDR fetch Write back decode execute memory Timing of a lw instruction in a single cycle CPU Timing of a lw instruction in a multi-cycle CPU 2ns We want to replace a long single CK cycle with 5 short ones: 1ns2ns 1ns 0x Instruction in IR ALU calculates something 01345=(0)2
11 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Therefore we should add registers to the single cycle CPU shown below: 5 [25:21]=Rs 5 [20:16]=Rt Reg File Instruction Memory PCALU Adde r 4 ck 16 [15:0] 5 Sext 16->32 Data Memory Rd Address D.In D. Out
12 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Adding registers to “split” the instruction to 5 stages: 5 [25:21]=Rs 5 [20:16]=Rt Reg File Instruction Memory PCALU Adde r 4 ck 16 [15:0] 5 Sext 16->32 Data Memory Rd Address D.In D. Out IR ck A B ALUoutMDR PCWrite
13 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Here is the book’s version of the multi-cycle CPU: Only PC and IR have write enable signals All other registers hold data for a single cycle
14 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Here is our version of A mult--cycle CPU capable of R-type & lw/sw & branch instructions 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck MDR ck ALUout ck A B << 2
15 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Let us explain the multi-cycle CPU First we’ll look at a CPU capable of performing only R-type instructions Then, we’ll add the lw instruction And the sw instruction Then, the beq instruction And finally, the j instruction
16 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Let us remind ourselves how works a single cycle CPU capable of performing R-type instructions. Here you see the data-path and the timing of an R-type instruction. 5 [25:21]=Rs 5 [20:16]=Rt 5 [15:11]=Rd Reg File Instruction Memory PCALU Adde r 4 ck 6 [31:26] 6 [5:0]= funct
17 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A single cycle CPU demo: R-type instruction 5 [25:21]=Rs 5 [20:16]=Rt 5 [15:11]=Rd Reg File Instruction Memory PC ALU ck 4
18 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi cycle CPU capable of performing R-type instructions 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU ck 5 5 IR[25:21]=Rs Rd IR ck ALUout ck A B
19 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi cycle CPU capable of R-type & instructions fetch 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU ck 5 5 IR[25:21]=Rs Rd IR ck ALUout ck A B 0 1
20 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi cycle CPU capable of R-type & instructions decode 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU ck 5 5 IR[25:21]=Rs Rd IR ck ALUout ck A B 1 2
21 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi cycle CPU capable of R-type & instructions execute 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ck 5 5 IR[25:21]=Rs Rd IR ck ALUout ck A B ALU 2 3
22 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi cycle CPU capable of R-type & instructions write back 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU ck 5 5 IR[25:21]=Rs Rd IR ck ALUout ck A B Rd ck 3 4
23 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. PC GPR input 0x Rs, RtALU inputs ALU output (Data = result of cala.) Memory output = the instruction fetch decode executeWrite Back Inst. Mem data Mem data IR A,B ALUout fetch Write back decode execute Timing of an R-type instruction in a single cycle CPU Timing of an R-type instruction in a multi-cycle CPU 34 (=0)012 PC Previous inst.Current instruction
24 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Mem data IR A,B ALUout fetch Write back decode execute GPR outputs ALU output IR=M ( PC ) A= Rs, B= Rt ALUuot= A op B IRWrite At the rising edge of CK: Rd=ALUout R-Type instruction takes 4 CKs PC Previous inst. Current instruction next inst. IR=M(PC) A= Rs, B= Rt ALUout = A op B Rd=ALUout The state diagram:
25 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi-cycle CPU capable of R-type instructions (PC calc. ) 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU 4 ck 5 5 IR[25:21]=Rs Rd IR ck ALUout ck A B
26 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Mem data IR A,B ALUout fetch Write back decode execute GPR outputs ALU output ALUuot = A op B At the rising edge of CK: Rd=ALUout PC = PC+4 PC next PC = current PC+4current PC next inst.Previous inst. current instruction PCWrite
27 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi cycle CPU capable of R-type & instructions fetch 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU ck 5 5 IR[25:21]=Rs Rd IR ck ALUout ck A B ALU 4
28 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Fetch WBR ALU Decode R-type The state diagram of a CPU capable of R-type instructions only IR=M(PC) PC = PC+4 ALUout=A op B A=Rs B=Rt Rd = ALUout
29 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Fetch WBR Load ALU AdrCmp Decode WB lw R-type lw The state diagram of a CPU capable of R-type and lw instructions ALUout= A+sext(imm) MDR = M(ALUout) Rt = MDR
30 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. We added registers to “split” the instruction to 5 stages. Let’s discuss the lw instruction 5 [25:21]=Rs 5 [20:16]=Rt Reg File Instruction Memory PCALU Adde r 4 ck 16 [15:0] 5 Sext 16->32 Data Memory Rd Address D.In D. Out IR ck A B ALUoutMDR PCWrite In ths single-cycle we kept the “data flow” from left to right. Here we change that a little, since as we’ll see, we are some parts of the CPU more than once during the same instruction. So we prefer to move data the memory. All parts related to lw only are blue
31 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. First we draw a multi-cycle CPU capable of R-type & lw instructions: 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck MDR ck ALUout ck A B ALU We just moved the data memoryAll parts related to lw only are blue Data Memory
32 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi-cycle CPU capable of R-type & lw instructions fetch 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck MDR ck ALUout ck A B ALU Data Memory
33 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi-cycle CPU capable of R-type & lw instructions decode 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck MDR ck ALUout ck A B << 2 Data Memory
34 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi-cycle CPU capable of R-type & lw instructions AdrCmp 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck MDR ck ALUout ck A B ALU Data Memory
35 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi-cycle CPU capable of R-type & lw instructions memory 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd Branch Address IR ck MDR ck ALUout ck A B << 2 Data Memory
36 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi-cycle CPU capable of R-type & lw instructions WB 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck MDR ck ALUout ck A B Data Memory ck Rt
37 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Can we unite the Instruction & Data memories? (They are not used simultaneously as in the single cycle CPU) 5 IR[20:16]=Rt Reg File Instruction Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck MDR ck ALUout ck A B Data Memory ck
38 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. So here is a multi-cycle CPU capable of R-type & lw instructions using a single memory for instructions & data 5 IR[20:16]=Rt Reg File PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck MDR ck ALUout ck A B Instruction & data Memory
39 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. PC D. Mem data D.Mem adrs 0x Rs, RtALU inputs ALU output (address) Memory output fetch Write backdecode execute Mem data memory I.Mem data PC IR A,B ALUout Mem data MDR fetch Write back decode execute memory Timing of a lw instruction in a single cycle CPU Timing of a lw instruction in a multi-cycle CPU PC+4 Previous inst. current instruction Data address Data to Rt
40 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Mem data IR A,B ALUout Mem data MDR fetch Write back decode execute memory GPR outputs ALU output IR=M ( PC ) PC= PC+4 A= Rs, B= Rt ALUuot= A+sext(imm) MDR=M(ALUout) At the rising edge of CK: Rt=MDR PC Previous inst. current instruction Data address Data to Rt PCWrite, IRWrite
41 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Fetch WBR Load ALU AdrCmp Decode WB lw R-type The state diagram of a CPU capable of R-type and lw instructions ALUout= A+sext(imm) MDR = M(ALUout) Rt = MDR IR=M(PC) PC = PC+4 ALUout=A op B A=Rs B=Rt Rd = ALUout
42 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi-cycle CPU capable of R-type & lw & sw instructions 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd Branch Address IR ck MDR ck ALUout ck A B << 2 lw sw
43 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Fetch WBR Load ALU AdrCmp Store Decode WB lw+sw R-type swlw The state diagram of a CPU capable of R-type and lw and sw instructions M(ALUout)=B IR=M(PC) PC = PC+4 ALUout=A op B A=Rs B=Rt Rd = ALUout ALUout= A+sext(imm) MDR = M(ALUout) Rt = MDR
44 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi-cycle CPU capable of R-type & lw/sw & branch instructions 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd IR ck IR ck ALUout ck A B <<2
45 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Calc PC=PC+sext(imm)<<2 Adding the instruction beq to the state diagram: Calc Rs -Rt (just to produce the zero signal) Fetch WBR Load Branch ALU AdrCmp Store Decode WB lw+sw R-type beq zero swlw not zero
46 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Adding the instruction beq to the state diagram, a more efficient way: Let’s use the decode state in which the ALU is doing nothing to compute the branch address. We’ll have to store it for 1 more CK cycle, until we know whether to branch or not! (We store it in the ALUout reg.) Fetch WBR Load Branch ALU AdrCmp Store Decode WB lw+sw R-type beq swlw Calc ALUout=PC+sext(imm)<<2 Calc Rs - Rt. If zero, load the PC with ALUout data, else do not load the PC
47 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi-cycle CPU capable of R-type & lw/sw & branch instructions 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd Branch Address IR ck IR ck ALUout ck A B <<2 PC+4
48 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Fetch Jump WBR Load Branch ALU AdrCmp Store Decode WB lw+sw R-type beq j swlw Adding the instruction j to the state diagram: PC = PC[31:28] || IR[25:0]<<2
49 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. A multi-cycle CPU capable of R-type & lw/sw & branch & jump instructions 5 IR[20:16]=Rt Reg File Instruction & data Memory PC ALU 4 ck 16 IR[15:0] 5 Sext 16->32 5 IR[25:21]=Rs Rd Branch Address IR ck IR ck ALUout ck A B <<2 PC+4= next address Jump address IR[25:0] <<2 + PC[31:28]
50 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. The phases (steps) of all instructions
51 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. MultiCycle implementation with Control
Final State Machine
53 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Fetch Jump WBR Load Branch ALU AdrCmp Store Decode WB lw+sw R-type beq j swlw The final state diagram:
54 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved.
55 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Implementation: Finite State Machine for Control (The book’s version)
56 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Opcode= IR[31:26] zero, neg, etc. next state current state control signalsnext state calculation Outputs decoder State reg ck The Control Finite State Machine: For 10 states coded 0-9, we need 4 bits, i.e., [S3,S2,S1,S0]
57 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. The control signals decoder We just implement the table of slide 54: Let’s look at ALUSrcA: it is “0” in states 0 and 1 and it is “1” in states 2, 6 and 8. In all other states we don’t care. let’s look at PCWrite: it is “1” in states 0 and 9. In all other states it must be “0”. And so, we’ll fill the table below and build the decoder.
58 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. The state machine “next state calc.” logic R-type=000000, lw=100011, sw=101011, beq=000100, bne=000101, lui=001111, j= , jal=000011, addi= Fetch 0 Jump 9 WBR 7 Load 3 Branch 8 ALU 6 AdrCmp 2 Store 5 Decode 1 WB 4 lw+sw R-type beq j swlw IR31IR30IR29IR28IR27IR26 opcode S3S2S1S0 current state S3S2S1S0 next state X0XXXXX X X1 0X XXX XXX X XXXXX R-type lw sw lw+sw
59 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Opcode = IR[31:26] next state current state control signalsnext state calculation Outputs decoder State reg ck The Control Finite State Machine: Meally machine PCWrite PCWriteCond zero Moore machine to PC
60 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Microprogramming
61 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Microinstruction
Microinstruction format
63 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Interrupt and exception Type of event From Where ? MIPS terminology Interrupt External I/O device request Invoke Operation system Internal Exception From user program Arithmetic Overflow Internal Exception Using an undefined Instruction Internal Exception Either Exception or interrupt Hardware malfunctions
64 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Exceptions handling Exception typeException vector address (in hex) Undefined instruction c Arithmetic Overflow c We have 2 ways to handle exceptions: Cause register or Vectored interrupts MIPS – Cause register
65 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Handling exceptions 10
66 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Handling exceptions
67 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Fetch Jump WBR Load Branch ALU AdrCmp Store Decode WB lw+sw R-type be q j swsw lw SavePC 10 IRET 1 JumpInt 11 Handling interrupts: int iret
68 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. DQ “1” irq int (to the state machine) eint clr_irq~ The interrupt source Handling an interrupt: remembering it in a FF until it is serviced
69 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Jumping to the interrupt routine C Iret Returning from interrupt Interrupt
70 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Jumping to the interrupt routine C Iret Returning from interrupt Interrupt irqeint 0 1
71 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Fetch > decode >ex >wb Fetch > Save_PC >JumpInt C IretFetch > decode > Iret The state machine in action during interrupt
72 Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. End of multi-cycle implementation