Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee wut_cha/home.htm
Outline Building a CPU Basic Components MIPS Instructions (Microprocessor without Interlocked Pipeline Stages) Basic 5 Steps for CPU Single-Cycle Design Multi-cycle Design Comparison of Single and Multi-cycle Designs 2Introduction to Computer Organization and Architecture
Overview Brief look Digital logic CPU Datapath MIPS Example 3Introduction to Computer Organization and Architecture
Digital Logic DQ D-type Flip-flop Clock (edge- triggered) S (Select input) A B F 0 1 Multiplexer D-type Flip-flop with Enable Clock (edge- triggered) DQ EN 0 1 DQ D Q (enable) Clock (edge- triggered) 4Introduction to Computer Organization and Architecture
Digital Logic 1 Bit DQ Clock (edge- triggered) EN 4 Bits Clock (edge- triggered) D3Q3 EN D2Q2 D1Q1 D0Q0 Registers N Bits DQ Clock (edge- triggered) EN 5Introduction to Computer Organization and Architecture
Digital Logic out in drive Tri-state Driver (Buffer) InDriveOut 00Z 10Z What is Z ?? 6Introduction to Computer Organization and Architecture
Digital Logic Adder/Subtractor or ALU A B F Carry-out Add/sub or ALUop Carry-in 7Introduction to Computer Organization and Architecture
Overview Brief look Digital logic How to Design a CPU Datapath MIPS Example 8Introduction to Computer Organization and Architecture
Designing a CPU: 5 Steps Analyze the instruction set datapath requirements MIPS: ADD, SUB, ORI, LW, SW, BR Meaning of each instruction given by RTL (register transfers) 2 types of registers: CPU/ISA registers, temporary registers Datapath requirements select the datapath components ALU, register file, adder, data memory, etc Assemble the datapath Datapath must support planned register transfers Ensure all instructions are supported Analyze datapath control required for each instruction Assemble the control logic 9Introduction to Computer Organization and Architecture
Step 1a: Analyze ISA All MIPS instructions are 32 bits long. Three instruction formats: R-type I-type J-type R: registers, I: immediate, J: jumps These formats intentionally chosen to simplify design optarget address bits26 bits oprsrtrdshamtfunct bits 5 bits oprsrt immediate bits16 bits5 bits 10Introduction to Computer Organization and Architecture
Step 1b: Analyze ISA Meaning of the fields: op: operation of the instruction rs, rt, rd: the source and destination register specifiers Destination is either rd (R-type), or rt (I-type) shamt: shift amount funct: selects the variant of the operation in the “op” field immediate: address offset or immediate value target address: target address of the jump instruction optarget address bits26 bits oprsrtrdshamtfunct bits 5 bits oprsrt immediate bits16 bits5 bits R- type I-type J-type 11Introduction to Computer Organization and Architecture
MIPS ISA: subset for today ADD and SUB addU rd, rs, rt subU rd, rs, rt OR Immediate: ori rt, rs, imm16 LOAD and STORE Word lw rt, rs, imm16 sw rt, rs, imm16 BRANCH: beq rs, rt, imm16 oprsrtrdshamtfunct bits 5 bits oprsrtimmediate bits16 bits5 bits oprsrtimmediate bits16 bits5 bits oprsrtimmediate bits16 bits5 bits 12Introduction to Computer Organization and Architecture
Step 2: Datapath Requirements REGISTER FILE MIPS ISA requires 32 registers, 32b each Called a register file Contains 32 entries Each entry is 32b AddU rd,rs,rt or SubU rd,rs,rt Read two sources rs, rt Operation rs + rt or rs – rt Write destination rd ← rs+/-rt Requirements Read two registers (rs, rt) Perform ALU operation Write a third register (rd) RdReg1 RdReg2 WrReg WrData RdData1 RdData2 RegWrite REGFILE Register Numbers (5 bits ea) How to implement? ALU ALUop Result Zero? 13Introduction to Computer Organization and Architecture
Step 3: Datapath Assembly ADDU rd, rs, rtSUBU rd, rs, rt Need an ALU Hook it up to REGISTER FILE REGFILE has 2 read ports (rs,rt), 1 write port (rd) rsParameters Come From Instruction Fields rt rd Control Signals Depend Upon Instruction Fields Eg: ALUop = f(Instruction) = f(op, funct) RdReg1 RdReg2 WrReg WrData RdData1 RdData2 RegWrite REGFILE ALU ALUop Result Zero? 14Introduction to Computer Organization and Architecture
Steps 2 and 3: ORI Instruction ORI rt, rs, Imm16 Need new ALUop for ‘OR’ function, hook up to REGFILE 1 read port (rs), 1 write port (rt), 1 const value (Imm16) rs From Instruction rt rt rd X RdReg1 RdReg2 WrReg WrData RdData1 RdData2 RegWrite REGFILE ZERO- EXTEND ALU ALUop Result Zero? 16-bits Imm16 ALUsrc 0 1 Control Signals Depend Upon Instruction Fields E.g.: ALUsrc = f(Instruction) = f(op, funct) 15Introduction to Computer Organization and Architecture
Steps 2 and 3 Destination Register Must select proper destination, rd or rt Depends on Instruction Type R-type may write rd I-type may write rt From Instruction RdReg1 RdReg2 WrReg WrData RdData1 RdData2 REGFILE rs rt rd ZERO- EXTEND ALU ALUop Result Zero? ALUsrc 0 1 RegDst bits Imm16 RegWrite 16Introduction to Computer Organization and Architecture
Steps 2 and 3: Load Word LW rt, rs, Imm16 Need Data Memory:data ← Mem[Addr] Addr is rs+Imm16, Imm16 is signed, use ALU for + Store in rt:rt ← Mem[rs+Imm16] RdReg1 RdReg2 WrReg WrData RdData1 RdData2 REGFILE rs rt rd SIGN/ ZERO- EXTEND ALU ALUop Result Zero? ALUsrc 0 1 RegDst 1 0 Imm16 RegWrite Addr RdData MemtoReg 0 1 DATAMEM ExtOp 17 Introduction to Computer Organization and Architecture
Steps 2 and 3: Store Word SW rt, rs, Imm16 Need Data Memory:Mem[Addr] ← data Addr is rs+Imm16, Imm16 is signed, use ALU for + Store in Mem:Mem[rs+Imm16] ← rt RdReg1 RdReg2 WrReg WrData RdData1 RdData2 REGFILE rs rt rd SIGN/ ZERO- EXTEND ALU ALUop Result Zero? ALUsrc 0 1 RegDst 1 0 Imm16 RegWrite Addr RdData WrData MemtoReg 1 0 DATAMEM ExtOp MemWrite 18Introduction to Computer Organization and Architecture
Writes: Need to Control Timing Problem: write to data memory Data can come anytime Addr must come first MemWrite must come after Addr Else? writes to wrong Addr! Solution: use ideal data memory Assume everything works ok How to fix this for real? One solution: synchronous memory Another solution: delay MemWr to come late Problems?: write to register file Does RegWrite signal come after WrReg number? When does the write to a register happen? Read from same register as being written? 19Introduction to Computer Organization and Architecture
Missing Pieces: Instruction Fetching Where does the Instruction come from? From instruction memory, of course! Recall: stored-program concept Alternatives? How about hard-coding wires and switches…? This is how ENIAC was programmed! (Electronic Numerical Integrator and Computer) How to branch? BEQ rs, rt, Imm16 20Introduction to Computer Organization and Architecture
Instruction Processing Fetch instruction Execute instruction Fetch next instruction Execute next instruction Fetch next instruction Execute next instruction Etc… How to maintain sequence? Use a counter! Branches (out of sequence) ? Load the counter! 21Introduction to Computer Organization and Architecture
Instruction Processing Program Counter Points to current instruction Address to instruction memory Instr ← InstrMem[PC] Next instruction: counts up by 4 Remember: memory is byte-addressable, instructions are 4 bytes PC ← PC + 4 Branch instruction: replace PC contents 22Introduction to Computer Organization and Architecture
Step 1: Analyze Instructions Register Transfer Language … op | rs | rt | rd | shamt | funct = InstrMem[ PC ] op | rs | rt | Imm16 = InstrMem[ PC ] Instr Register Transfers ADDUR[rd] ← R[rs] + R[rt];PC ← PC + 4 SUBUR[rd] ← R[rs] – R[rt];PC ← PC + 4 ORIR[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4 LOADR[rt] ← MEM[ R[rs] + sign_ext(Imm16)];PC ← PC + 4 STOREMEM[ R[rs] + sign_ext(Imm16) ] ← R[rt];PC ← PC + 4 BEQif ( R[rs] == R[rt] ) then PC ← PC { sign_ext(Imm16)] || b’00’ } else PC ← PC Introduction to Computer Organization and Architecture
Steps 2 and 3: Datapath & Assembly PC: a register Counter, counts by +4 Provides address to Instruction Memory Add Read address Instruction Memory Instruction [31:0] PC Instruction[31:0] 4 24Introduction to Computer Organization and Architecture
Steps 2 and 3: Datapath & Assembly Add result Read address Instruction Memory Instruction [31:0] PC 0Mux10Mux1 Sign/ Zero Extend Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] (Imm16) PCSrc Shift Left 2 4 PC: a register Counter, counts by +4 Sometimes, must add SignExtend{Imm16||b’00’} for branch instructions Note: the sign-extender for Imm16 is already in the datapath (everything else is new) ExtOp 25
Steps 2 and 3: Add Previous Datapath Add ALU Add result ALU result Zero Read address Instruction Memory Instruction [31:0] Register File Data Memory PC Addr- ess Read data Write data 0Mux10Mux1 1Mux01Mux0 0Mux10Mux1 0Mux10Mux1 ALU Control Sign/ Zero Extend Write reg. Read reg. 1 Read reg. 2 Read data 2 Read data 1 Write data Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] (Imm16) Instruction[5:0] (funct) RegWrite RegDst ALUSrc MemWrite PCSrc MemtoReg ALUOp Shift Left 2 4 ExtOp
What have we done? Created a simple CPU datapath Control still missing (next slide) Single-cycle CPU Every instruction takes 1 clock cycle Clocking ? 27Introduction to Computer Organization and Architecture
One Clock Cycle Clock Locations PC, REGFILE have clocks Operation On rising edge, PC will get new value Maybe REGFILE will have one value updated as well After rising edge PC and REGFILE can’t change New value out of PC Instruction out of INSTRMEM Instruction selects registers to read from REGFILE Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc ALU does its work DataMem may be read (depending on instruction) Result value goes back to REGFILE New PC value goes back to PC Await next clock edge Lots to do in only 1 clock cycle !! 28Introduction to Computer Organization and Architecture
Missing Steps? Control is missing (Steps 4 and 5 we mentioned earlier) Generate the green signals ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc These are all f(Instruction), where f() is a logic expression Will look at control strategies in upcoming lecture Implementation Details How to implement REGFILE? Read port: tristate buffers? Multiplexer? Memory? Two read ports: two of above? Write port: how to write only 1 register? How to control writes to memory? To register file? More instructions Shift instructions Jump instruction Etc 29Introduction to Computer Organization and Architecture
1-Cycle CPU Datapath Add ALU Add result ALU result Zero Read address Instruction Memory Instruction [31:0] Register File Data Memory PC Addr- ess Read data Write data 0Mux10Mux1 1Mux01Mux0 0Mux10Mux1 0Mux10Mux1 ALU Control Sign/ Zero Extend Write reg. Read reg. 1 Read reg. 2 Read data 2 Read data 1 Write data Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] (Imm16) Instruction[5:0] (funct) RegWrite RegDst ALUSrc MemWrite PCSrc MemtoReg ALUOp Shift Left 2 4 ExtOp
1-cycle CPU Datapath + Control PCSrc Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0] Instruction [31:26] Sign/ Zero Extend Data Memory Addr- ess Read data Write data ALU result Zero Read address Instruction Memory Instruction [31:0] Add PC 4 Add result Shift Left 2 Register File Write reg. Read reg. 1 Read reg. 2 Read data 2 Read data 1 Write data RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite ALU control Con- trol
Input or Output Signal NameR-formatLwSwBeq Inputs Op50110 Op40000 Op30010 Op20001 Op10110 Op00110 Outputs RegDst10XX ALUSrc0110 MemtoReg01XX RegWrite1100 MemRead0100 MemWrite0010 Branch0001 ALUOp11000 ALUOp00001 Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc. 1-cycle CPU Control – Lookup Table
1-cycle CPU + Jump Instruction Instruction [31:26] Instruction[25:0] PC + 4 [31..28] Jump address [31..0] Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0]
1-cycle CPU Problems? Every instruction 1 cycle Some instructions “do more work” Eg, lw must read from DATAMEM All instructions must have same clock period… Many instructions run slower than necessary Tricky timing on MemWrite, RegWrite(?) signals Write signal must come *after* address is stable Need extra resources… PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM 34Introduction to Computer Organization and Architecture
Performance! Single-Cycle CPU Performance Execute one instruction per clock cycle (CPI=1) Clock cycle time? Note dataflow includes: INSTRMEM read REGFILE access Sign extension ALU operation DATAMEM read REGFILE/PC write Not every instruction uses all resources (eg, DATAMEM read) Can we change clock period for each instruction? No! (Why not?) One clock period: the worst case! This is why a single-cycle CPU is not good for performance 35Introduction to Computer Organization and Architecture
1-cycle CPU Datapath + Controller Instruction [31:26] Instruction[25:0] PC + 4 [31..28] Jump address [31..0] Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0]
1-cycle CPU Summary Operation 1 cycle per instruction Control signals held fixed during entire cycle (except BRANCH) Only 2 registers PC, updated every clock cycle REGFILE, updated when required During clock cycle, data flows from register-outputs to register-inputs Fixed clock frequency / period Performance 1 instruction per cycle Slowest instruction determines clock frequency Outstanding issue: MemWrite timing Assume this signal writes to memory at end of clock cycle 37Introduction to Computer Organization and Architecture
Multi-cycle CPU Goals Improve performance Break each instruction into smaller steps / multiple cycles LW instruction 5 cycles SW instruction 4 cycles R-type instruction 4 cycles Branch, Jump 3 cycles Aim for 5x clock frequency Complex instructions (eg, LW) 5 cycles same performance as before Simple instructions (eg, ADD) fewer cycles faster Save resources (gates/transistors) Re-use ALU over multiple cycles Put INSTR + DATA in same memory MemWrite timing solved? 38Introduction to Computer Organization and Architecture
Multi-cycle CPU Datapath Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] Instruction Register Memory Data Register ALU Out A B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data Add multiplexers + control signals ( IorD, MemtoReg, ALUSrcA, ALUSrcB) Move signal paths (+4, Shift Left 2) 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x
Multi-cycle CPU Datapath Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] ALU Out A B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data Add registers + control signals (IR, MDR, A, B, ALUOut) Registers with no control signal load value every clock cycle (eg, PC) 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register
Instruction Execution Example Execute a “Load Word” instruction LW rt, 0(rs) 5 Steps 1. Fetch instruction 2. Read registers 3. Compute address 4. Read data 5. Write registers 41Introduction to Computer Organization and Architecture
Load Word Instruction Sequence 1. Fetch Instruction InstructionRegister ← Mem[PC] Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction[5:0] Instr[15:0] ALU Out A B Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [15:0] Memory MemData Address
Load Word Instruction Sequence 2. Read Registers A ← Registers[Rs] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] ALU Out A B Memory MemData Address Write data Registers RdData2 RdReg2 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [25:21] RdData1 RdReg1
Load Word Instruction Sequence 3. Compute Address ALUOut ← A + {SignExt(Imm16),b’00’} Instruction [25:21] Instruction [20:16] Instruction [15:0] Instruction[5:0] Instr[15:0] B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [15:11] ALU Out A
Load Word Instruction Sequence 4. Read Data MDR ← Memory[ALUOut] Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] A B Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register ALU Out Memory MemData Address
Load Word Instruction Sequence 5. Write Registers Registers[Rt] ← MDR Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] ALU Out A B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Write reg Write data
Load Word Instruction Sequence All 5 Steps Shown Instruction[5:0] Instr[15:0] B Write data Registers RdData2 RdReg2 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] ALU Out Memory MemData Address RdData1 RdReg1 Write reg Write data A
Multi-cycle Load Word: Recap 1. Fetch Instruction InstructionRegister ← Mem[PC] 2. Read Registers A ← Registers[Rs] 3. Compute Address ALUOut ← A + {SignExt(Imm16)} 4. Read Data MDR ← Memory[ALUOut] 5. Write Registers Registers[Rt] ← MDR Missing Steps? 48Introduction to Computer Organization and Architecture
Multi-cycle Load Word: Recap 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC Read Registers A ← Registers[Rs] 3. Compute Address ALUOut ← A + {SignExt(Imm16)} 4. Read Data MDR ← Memory[ALUOut] 5. Write Registers Registers[Rt] ← MDR Missing Steps? Must increment the PC Do it as part of the instruction fetch (in step 1) Need PCWrite control signal 49Introduction to Computer Organization and Architecture
Multi-cycle R-Type Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC Read Registers A ← Registers[Rs];B ← Registers[Rt] 3. Compute Value ALUOut ← A op B 4. Write Registers Registers[Rd] ← ALUOut RTL describes data flow action in each clock cycle Control signals determine precise data flow Each step implies unique control values 50Introduction to Computer Organization and Architecture
Multi-cycle R-Type Instruction: Control Signal Values 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC + 4 MemRead=1, ALUSrcA=0, IorD=0, IRWrite, ALUSrcB=01, ALUop=00, PCWrite, PCSource=00 2. Read Registers A ← Registers[Rs];B ← Registers[Rt] ALUSrcA=0, ALUSrcB=11, ALUop=00 3. Compute Value ALUOut ← A op B ALUSrcA=1, ALUSrcB=00, ALUop=10 4. Write Registers Registers[Rd] ← ALUOut RegDst=1, RegWrite, MemtoReg=0 Each step implies unique control values Fixed for entire cycle “Default value” implied if unspecified 51Introduction to Computer Organization and Architecture
Check Your Work – Is RTL Valid ? 1. Datapath check Within one cycle… Each cycle has valid data flow path (path exists) Each register gets only one new value Across multiple cycles… Register value is defined before use in previous (earlier in time) clock cycle Eg, “A 3” must occur before “B A” Make sure register value doesn’t disappear if set >1 cycle earlier 2. Control signal check Each cycle, RTL describing the datapath flow implies a value for each control signal 0 or 1 or default or don’t care Each control signal gets only one fixed value the entire cycle 3. Overall check Does the sequence of steps work ? 52Introduction to Computer Organization and Architecture
Multi-cycle BEQ Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC Read Registers, Precompute Target A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’} 3. Compare Registers, Conditional Branch if( (A – B) ==0 ) PC ← ALUOut Green shows PC calculation flow (in parallel with other operations) 53Introduction to Computer Organization and Architecture
Multi-cycle Datapath with Control Signals Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0] PCWrite IorD MemRead MemWrite MemtoReg IRWrite PCSrc ALUOp ALUSrcA ALUSrcB RegWrite RegDst ALU Control 54Introduction to Computer Organization and Architecture
Multi-cycle Datapath with Controller Instr. [31:26] Instr[31:26] Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0]
Multi-cycle BEQ Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC Read Registers, Precompute Target A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’} 3. Compare Registers, Conditional Branch if( (A – B) ==0 ) PC ← ALUOut Green shows PC calculation flow (in parallel with other operations) 56Introduction to Computer Organization and Architecture
Multi-cycle Datapath with Control Signals Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0] PCWrite IorD MemRead MemWrite MemtoReg IRWrite PCSrc ALUOp ALUSrcA ALUSrcB RegWrite RegDst ALU Control 57Introduction to Computer Organization and Architecture
Multi-cycle Datapath with Controller Instr. [31:26] Instr[31:26] Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0]
Multi-cycle CPU Control: Overview General approach: Finite State Machine (FSM) Need details in each branch of control… Precise outputs for each state (Mealy depends on inputs, Moore does not) Precise “next state” for each state (can depend on inputs) Control Signal Outputs Control Signal Outputs 59Introduction to Computer Organization and Architecture
How to Implement FSM ? Manually with logic gates + FFs Bubble diagram, next-state table, state assignment Karnaugh map for each state bit, each output bit (painful!) High-level language description (eg, Verilog, VHDL) Describe FSM bubble diagram (next-states, output values) Automatically synthesized into gates + FFs Microcode (µ-code) description Sequence through many µ-ops for each CPU instruction One µ-op (µ-instruction) sends correct control signal for 1 cycle µ-op similar to one bubble in FSM Acts like a mini-CPU within a CPU µPC: microcode program counter Microcode storage memory contains µ-ops Can look similar to RTL or some new “assembly language” 60Introduction to Computer Organization and Architecture
FSM Specification: Bubble Diagram Can build this by examining RTL It is possible to automatically convert RTL into this form ! 61
FSM: Gates + FFs Implementation FSM High-level Organization 62Introduction to Computer Organization and Architecture
FSM: Microcode Implementation Adder 1 Datapath control outputs Sequencing control Inputs from instruction register opcode field Microcode Storage (memory) Inputs Outputs Microprogram Counter Address Select Logic 63Introduction to Computer Organization and Architecture
Multi-cycle CPU with Control FSM Instr. [31:26] Instr[31:26] Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0] FSM Control Outputs Conditional Branch
Control FSM: Overview General approach: Finite State Machine (FSM) Need details in each branch of control… 65Introduction to Computer Organization and Architecture
Detailed FSM 66
Detailed FSM 67
Detailed FSM: Instruction Fetch 68Introduction to Computer Organization and Architecture
Detailed FSM: Memory Reference LW SW 69
Detailed FSM: R-Type Instruction 70Introduction to Computer Organization and Architecture
Detailed FSM: Branch Instruction 71Introduction to Computer Organization and Architecture
Detailed FSM: Jump Instruction 72Introduction to Computer Organization and Architecture
Performance Comparison Single-cycle CPU vs Multi-cycle CPU 73Introduction to Computer Organization and Architecture
Simple Comparison Single-cycle CPU 1 clock cycle 5 clock cycles Multi-cycle CPU 4 clock cycles Multi-cycle CPU 3 clock cycles Multi-cycle CPU SW, R-type BEQ, J LW All
What’s really happening? Single-cycle CPU Multi-cycle CPU ( Load Word Instruction ) FetchDecodeMemoryWrite Calc Addr Ideally: 75Introduction to Computer Organization and Architecture
In practice, steps differ in speeds… Single-cycle CPU Multi-cycle CPU FetchDecodeMemory Calc Addr FetchDecodeMemory Calc Addr Write Violation! Wasted time! Load Word Instruction 76Introduction to Computer Organization and Architecture
Single-cycle vs Multi-cycle LW instruction faster for single-cycle Single-cycle CPU FetchDecodeMemory Calc Addr FetchDecodeMemory Calc Addr Write Violation fixed! Multi-cycle CPU Now wasted time is larger! 77Introduction to Computer Organization and Architecture
Single-cycle vs Multi-cycle SW instruction ~ same speed Single-cycle CPU FetchDecodeMemory Calc Addr FetchDecodeMemory Calc Addr Multi-cycle CPU Wasted time! Speed diff 78Introduction to Computer Organization and Architecture
Single-cycle vs Multi-cycle BEQ, J instruction faster for multi-cycle Single-cycle CPU FetchDecode Calc Addr FetchDecode Calc Addr Wasted time! Speed diff Multi-cycle CPU 79Introduction to Computer Organization and Architecture
Performance Summary Which CPU implementation is faster? LW single-cycle is faster SW,R-type about the same BEQ,J multi-cycle is faster Real programs use a mix of these instructions Overall performance depends instruction frequency ! 80Introduction to Computer Organization and Architecture
Implementation Summary Single-cycle CPU 1 instruction per cycle (eg, 1MHz 1 MIPS) No “wasted time” on most complex instruction Large wasted time on simpler instructions Simple controller (just a lookup table or memory) Simple instructions Multi-cycle CPU << 1 instruction per cycle (eg, 1MHz 0.2 MIPS) Small time wasted on most complex instruction Hence, this instruction always slower than single-cycle CPU Small time wasted on simple instructions Eliminates “large wasted time” by using fewer clock cycles Complex controller (FSM) Potential to create complex instructions 81Introduction to Computer Organization and Architecture
The End Lecture 11