Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee wut_cha/home.htm.

Slides:



Advertisements
Similar presentations
EECE476 Lecture 6: Designing a Single-Cycle CPU Datapath Chapter 5, Sections 5.2, 5.3 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
Advertisements

EECE476 Lecture 7: Single-Cycle CPU Instruction Processing & Control Chapter 5, Sections 5.3, 5.4 The University of British ColumbiaEECE 476© 2005 Guy.
1 Chapter Five The Processor: Datapath and Control.
The Processor: Datapath & Control
Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.
EECE476 Lecture 9: Multi-cycle CPU Datapath Chapter 5: Section 5.5 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
Savio Chau Single Cycle Controller Design Last Time: Discussed the Designing of a Single Cycle Datapath Control Datapath Memory Processor (CPU) Input Output.
EECE476 Lectures 10: Multi-cycle CPU Control Chapter 5: Section 5.5 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
Preparation for Midterm Binary Data Storage (integer, char, float pt) and Operations, Logic, Flip Flops, Switch Debouncing, Timing, Synchronous / Asynchronous.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ECE 232 L13. Control.1 ©UCB, DAP’ 97 ECE 232 Hardware Organization and Design Lecture 13 Control Design
Computer Structure - Datapath and Control Goal: Design a Datapath  We will design the datapath of a processor that includes a subset of the MIPS instruction.
CS61C L25 CPU Design : Designing a Single-Cycle CPU (1) Garcia, Fall 2006 © UCB T-Mobile’s Wi-Fi / Cell phone  T-mobile just announced a new phone that.
CS 61C L17 Control (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #17: CPU Design II – Control
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 25 CPU design (of a single-cycle CPU) Intel is prototyping circuits that.
331 W10.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 10 Building a Multi-Cycle Datapath [Adapted from Dave Patterson’s.
EEM 486: Computer Architecture Lecture 3 Designing a Single Cycle Datapath.
361 control Computer Architecture Lecture 9: Designing Single Cycle Control.
CS61C L27 Single Cycle CPU Control (1) Garcia, Fall 2006 © UCB Wireless High Definition?  Several companies will be working on a “WirelessHD” standard,
CSE431 L05 Basic MIPS Architecture.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 05: Basic MIPS Architecture Review Mary Jane Irwin.
Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
CS3350B Computer Architecture Winter 2015 Lecture 5.6: Single-Cycle CPU: Datapath Control (Part 1) Marc Moreno Maza [Adapted.
55:035 Computer Architecture and Organization Lecture 9.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Computer Architecture Chapter 5 Fall 2005 Department of Computer Science Kent State University.
Datapath and Control: MultiCycle Implementation. Performance of Single Cycle Machines °Assume following operation times: Memory units : 200 ps ALU and.
CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides
1 CS/COE0447 Computer Organization & Assembly Language Multi-Cycle Execution.
EEM 486: Computer Architecture Designing a Single Cycle Datapath.
C HAPTER 5 T HE PROCESSOR : D ATAPATH AND C ONTROL M ULTICYCLE D ESIGN.
CDA 3101 Fall 2013 Introduction to Computer Organization Multicycle Datapath 9 October 2013.
Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]
CS3350B Computer Architecture Winter 2015 Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2) Marc Moreno Maza [Adapted.
1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.
1 Processor: Datapath and Control Single cycle processor –Datapath and Control Multicycle processor –Datapath and Control Microprogramming –Vertical and.
1 Processor: Datapath and Control Single cycle processor –Datapath and Control Multicycle processor –Datapath and Control Microprogramming –Vertical and.
IT 251 Computer Organization and Architecture Multi Cycle CPU Datapath Chia-Chi Teng.
ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
Design a MIPS Processor
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
EEM 486: Computer Architecture Lecture 3 Designing Single Cycle Control.
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single-Cycle CPU Datapath & Control Part 2 Instructors: Krste Asanovic & Vladimir Stojanovic.
Single Cycle Controller Design
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
1 The final datapath. 2 Control  The control unit is responsible for setting all the control signals so that each instruction is executed properly. —The.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
Design a MIPS Processor (II)
IT 251 Computer Organization and Architecture
Single Cycle Processor
Multi-Cycle CPU.
Basic MIPS Architecture
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
Multiple Cycle Implementation of MIPS-Lite CPU
Chapter Five The Processor: Datapath and Control
The Processor Lecture 3.2: Building a Datapath with Control
Vishwani D. Agrawal James J. Danaher Professor
COMS 361 Computer Organization
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Processor: Multi-Cycle Datapath & Control
Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Chapter Four The Processor: Datapath and Control
The Processor: Datapath & Control.
COMS 361 Computer Organization
Processor: Datapath and Control
CS161 – Design and Architecture of Computer Systems
Presentation transcript:

Introduction to Computer Organization and Architecture Lecture 11 By Juthawut Chantharamalee wut_cha/home.htm

Outline  Building a CPU Basic Components MIPS Instructions (Microprocessor without Interlocked Pipeline Stages) Basic 5 Steps for CPU Single-Cycle Design Multi-cycle Design Comparison of Single and Multi-cycle Designs 2Introduction to Computer Organization and Architecture

Overview  Brief look Digital logic  CPU Datapath MIPS Example 3Introduction to Computer Organization and Architecture

Digital Logic DQ D-type Flip-flop Clock (edge- triggered) S (Select input) A B F 0 1 Multiplexer D-type Flip-flop with Enable Clock (edge- triggered) DQ EN 0 1 DQ D Q (enable) Clock (edge- triggered) 4Introduction to Computer Organization and Architecture

Digital Logic 1 Bit DQ Clock (edge- triggered) EN 4 Bits Clock (edge- triggered) D3Q3 EN D2Q2 D1Q1 D0Q0 Registers N Bits DQ Clock (edge- triggered) EN 5Introduction to Computer Organization and Architecture

Digital Logic out in drive Tri-state Driver (Buffer) InDriveOut 00Z 10Z What is Z ?? 6Introduction to Computer Organization and Architecture

Digital Logic Adder/Subtractor or ALU A B F Carry-out Add/sub or ALUop Carry-in 7Introduction to Computer Organization and Architecture

Overview  Brief look Digital logic  How to Design a CPU Datapath MIPS Example 8Introduction to Computer Organization and Architecture

Designing a CPU: 5 Steps  Analyze the instruction set  datapath requirements MIPS: ADD, SUB, ORI, LW, SW, BR Meaning of each instruction given by RTL (register transfers) 2 types of registers: CPU/ISA registers, temporary registers  Datapath requirements  select the datapath components ALU, register file, adder, data memory, etc  Assemble the datapath Datapath must support planned register transfers Ensure all instructions are supported  Analyze datapath control required for each instruction  Assemble the control logic 9Introduction to Computer Organization and Architecture

Step 1a: Analyze ISA  All MIPS instructions are 32 bits long.  Three instruction formats: R-type I-type J-type  R: registers, I: immediate, J: jumps  These formats intentionally chosen to simplify design optarget address bits26 bits oprsrtrdshamtfunct bits 5 bits oprsrt immediate bits16 bits5 bits 10Introduction to Computer Organization and Architecture

Step 1b: Analyze ISA  Meaning of the fields: op: operation of the instruction rs, rt, rd: the source and destination register specifiers  Destination is either rd (R-type), or rt (I-type) shamt: shift amount funct: selects the variant of the operation in the “op” field immediate: address offset or immediate value target address: target address of the jump instruction optarget address bits26 bits oprsrtrdshamtfunct bits 5 bits oprsrt immediate bits16 bits5 bits R- type I-type J-type 11Introduction to Computer Organization and Architecture

MIPS ISA: subset for today  ADD and SUB addU rd, rs, rt subU rd, rs, rt  OR Immediate: ori rt, rs, imm16  LOAD and STORE Word lw rt, rs, imm16 sw rt, rs, imm16  BRANCH: beq rs, rt, imm16 oprsrtrdshamtfunct bits 5 bits oprsrtimmediate bits16 bits5 bits oprsrtimmediate bits16 bits5 bits oprsrtimmediate bits16 bits5 bits 12Introduction to Computer Organization and Architecture

Step 2: Datapath Requirements REGISTER FILE  MIPS ISA requires 32 registers, 32b each Called a register file Contains 32 entries Each entry is 32b  AddU rd,rs,rt or SubU rd,rs,rt Read two sources rs, rt Operation rs + rt or rs – rt Write destination rd ← rs+/-rt  Requirements Read two registers (rs, rt) Perform ALU operation Write a third register (rd) RdReg1 RdReg2 WrReg WrData RdData1 RdData2 RegWrite REGFILE Register Numbers (5 bits ea) How to implement? ALU ALUop Result Zero? 13Introduction to Computer Organization and Architecture

Step 3: Datapath Assembly  ADDU rd, rs, rtSUBU rd, rs, rt Need an ALU  Hook it up to REGISTER FILE  REGFILE has 2 read ports (rs,rt), 1 write port (rd) rsParameters Come From Instruction Fields rt rd Control Signals Depend Upon Instruction Fields Eg: ALUop = f(Instruction) = f(op, funct) RdReg1 RdReg2 WrReg WrData RdData1 RdData2 RegWrite REGFILE ALU ALUop Result Zero? 14Introduction to Computer Organization and Architecture

Steps 2 and 3: ORI Instruction  ORI rt, rs, Imm16 Need new ALUop for ‘OR’ function, hook up to REGFILE 1 read port (rs), 1 write port (rt), 1 const value (Imm16) rs From Instruction rt rt rd X RdReg1 RdReg2 WrReg WrData RdData1 RdData2 RegWrite REGFILE ZERO- EXTEND ALU ALUop Result Zero? 16-bits Imm16 ALUsrc 0 1 Control Signals Depend Upon Instruction Fields E.g.: ALUsrc = f(Instruction) = f(op, funct) 15Introduction to Computer Organization and Architecture

Steps 2 and 3 Destination Register  Must select proper destination, rd or rt Depends on Instruction Type  R-type may write rd  I-type may write rt From Instruction RdReg1 RdReg2 WrReg WrData RdData1 RdData2 REGFILE rs rt rd ZERO- EXTEND ALU ALUop Result Zero? ALUsrc 0 1 RegDst bits Imm16 RegWrite 16Introduction to Computer Organization and Architecture

Steps 2 and 3: Load Word  LW rt, rs, Imm16 Need Data Memory:data ← Mem[Addr]  Addr is rs+Imm16, Imm16 is signed, use ALU for + Store in rt:rt ← Mem[rs+Imm16] RdReg1 RdReg2 WrReg WrData RdData1 RdData2 REGFILE rs rt rd SIGN/ ZERO- EXTEND ALU ALUop Result Zero? ALUsrc 0 1 RegDst 1 0 Imm16 RegWrite Addr RdData MemtoReg 0 1 DATAMEM ExtOp 17 Introduction to Computer Organization and Architecture

Steps 2 and 3: Store Word  SW rt, rs, Imm16 Need Data Memory:Mem[Addr] ← data  Addr is rs+Imm16, Imm16 is signed, use ALU for + Store in Mem:Mem[rs+Imm16] ← rt RdReg1 RdReg2 WrReg WrData RdData1 RdData2 REGFILE rs rt rd SIGN/ ZERO- EXTEND ALU ALUop Result Zero? ALUsrc 0 1 RegDst 1 0 Imm16 RegWrite Addr RdData WrData MemtoReg 1 0 DATAMEM ExtOp MemWrite 18Introduction to Computer Organization and Architecture

Writes: Need to Control Timing  Problem: write to data memory Data can come anytime Addr must come first MemWrite must come after Addr  Else? writes to wrong Addr!  Solution: use ideal data memory Assume everything works ok How to fix this for real? One solution: synchronous memory Another solution: delay MemWr to come late  Problems?: write to register file Does RegWrite signal come after WrReg number? When does the write to a register happen? Read from same register as being written? 19Introduction to Computer Organization and Architecture

Missing Pieces: Instruction Fetching  Where does the Instruction come from? From instruction memory, of course! Recall: stored-program concept  Alternatives? How about hard-coding wires and switches…? This is how ENIAC was programmed! (Electronic Numerical Integrator and Computer)  How to branch? BEQ rs, rt, Imm16 20Introduction to Computer Organization and Architecture

Instruction Processing  Fetch instruction  Execute instruction  Fetch next instruction  Execute next instruction  Fetch next instruction  Execute next instruction  Etc…  How to maintain sequence? Use a counter!  Branches (out of sequence) ? Load the counter! 21Introduction to Computer Organization and Architecture

Instruction Processing  Program Counter Points to current instruction Address to instruction memory  Instr ← InstrMem[PC] Next instruction: counts up by 4  Remember: memory is byte-addressable, instructions are 4 bytes  PC ← PC + 4 Branch instruction: replace PC contents 22Introduction to Computer Organization and Architecture

Step 1: Analyze Instructions  Register Transfer Language … op | rs | rt | rd | shamt | funct = InstrMem[ PC ] op | rs | rt | Imm16 = InstrMem[ PC ] Instr Register Transfers ADDUR[rd] ← R[rs] + R[rt];PC ← PC + 4 SUBUR[rd] ← R[rs] – R[rt];PC ← PC + 4 ORIR[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4 LOADR[rt] ← MEM[ R[rs] + sign_ext(Imm16)];PC ← PC + 4 STOREMEM[ R[rs] + sign_ext(Imm16) ] ← R[rt];PC ← PC + 4 BEQif ( R[rs] == R[rt] ) then PC ← PC { sign_ext(Imm16)] || b’00’ } else PC ← PC Introduction to Computer Organization and Architecture

Steps 2 and 3: Datapath & Assembly  PC: a register Counter, counts by +4 Provides address to Instruction Memory Add Read address Instruction Memory Instruction [31:0] PC Instruction[31:0] 4 24Introduction to Computer Organization and Architecture

Steps 2 and 3: Datapath & Assembly Add result Read address Instruction Memory Instruction [31:0] PC 0Mux10Mux1 Sign/ Zero Extend Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] (Imm16) PCSrc Shift Left 2 4 PC: a register  Counter, counts by +4  Sometimes, must add SignExtend{Imm16||b’00’} for branch instructions Note: the sign-extender for Imm16 is already in the datapath (everything else is new) ExtOp 25

Steps 2 and 3: Add Previous Datapath Add ALU Add result ALU result Zero Read address Instruction Memory Instruction [31:0] Register File Data Memory PC Addr- ess Read data Write data 0Mux10Mux1 1Mux01Mux0 0Mux10Mux1 0Mux10Mux1 ALU Control Sign/ Zero Extend Write reg. Read reg. 1 Read reg. 2 Read data 2 Read data 1 Write data Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] (Imm16) Instruction[5:0] (funct) RegWrite RegDst ALUSrc MemWrite PCSrc MemtoReg ALUOp Shift Left 2 4 ExtOp

What have we done?  Created a simple CPU datapath Control still missing (next slide)  Single-cycle CPU Every instruction takes 1 clock cycle Clocking ? 27Introduction to Computer Organization and Architecture

One Clock Cycle  Clock Locations PC, REGFILE have clocks  Operation On rising edge, PC will get new value  Maybe REGFILE will have one value updated as well After rising edge  PC and REGFILE can’t change  New value out of PC  Instruction out of INSTRMEM  Instruction selects registers to read from REGFILE  Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc  ALU does its work  DataMem may be read (depending on instruction)  Result value goes back to REGFILE  New PC value goes back to PC  Await next clock edge Lots to do in only 1 clock cycle !! 28Introduction to Computer Organization and Architecture

Missing Steps?  Control is missing (Steps 4 and 5 we mentioned earlier) Generate the green signals  ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc These are all f(Instruction), where f() is a logic expression Will look at control strategies in upcoming lecture  Implementation Details How to implement REGFILE?  Read port: tristate buffers? Multiplexer? Memory?  Two read ports: two of above?  Write port: how to write only 1 register? How to control writes to memory? To register file?  More instructions Shift instructions Jump instruction Etc 29Introduction to Computer Organization and Architecture

1-Cycle CPU Datapath Add ALU Add result ALU result Zero Read address Instruction Memory Instruction [31:0] Register File Data Memory PC Addr- ess Read data Write data 0Mux10Mux1 1Mux01Mux0 0Mux10Mux1 0Mux10Mux1 ALU Control Sign/ Zero Extend Write reg. Read reg. 1 Read reg. 2 Read data 2 Read data 1 Write data Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] (Imm16) Instruction[5:0] (funct) RegWrite RegDst ALUSrc MemWrite PCSrc MemtoReg ALUOp Shift Left 2 4 ExtOp

1-cycle CPU Datapath + Control PCSrc Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0] Instruction [31:26] Sign/ Zero Extend Data Memory Addr- ess Read data Write data ALU result Zero Read address Instruction Memory Instruction [31:0] Add PC 4 Add result Shift Left 2 Register File Write reg. Read reg. 1 Read reg. 2 Read data 2 Read data 1 Write data RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite ALU control Con- trol

Input or Output Signal NameR-formatLwSwBeq Inputs Op50110 Op40000 Op30010 Op20001 Op10110 Op00110 Outputs RegDst10XX ALUSrc0110 MemtoReg01XX RegWrite1100 MemRead0100 MemWrite0010 Branch0001 ALUOp11000 ALUOp00001  Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc. 1-cycle CPU Control – Lookup Table

1-cycle CPU + Jump Instruction Instruction [31:26] Instruction[25:0] PC + 4 [31..28] Jump address [31..0] Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0]

1-cycle CPU Problems?  Every instruction 1 cycle  Some instructions “do more work” Eg, lw must read from DATAMEM  All instructions must have same clock period…  Many instructions run slower than necessary  Tricky timing on MemWrite, RegWrite(?) signals Write signal must come *after* address is stable  Need extra resources… PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM 34Introduction to Computer Organization and Architecture

Performance!  Single-Cycle CPU Performance Execute one instruction per clock cycle (CPI=1) Clock cycle time? Note dataflow includes:  INSTRMEM read  REGFILE access  Sign extension  ALU operation  DATAMEM read  REGFILE/PC write Not every instruction uses all resources (eg, DATAMEM read) Can we change clock period for each instruction?  No! (Why not?) One clock period: the worst case! This is why a single-cycle CPU is not good for performance 35Introduction to Computer Organization and Architecture

1-cycle CPU Datapath + Controller Instruction [31:26] Instruction[25:0] PC + 4 [31..28] Jump address [31..0] Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0]

1-cycle CPU Summary  Operation 1 cycle per instruction Control signals held fixed during entire cycle (except BRANCH) Only 2 registers  PC, updated every clock cycle  REGFILE, updated when required During clock cycle, data flows from register-outputs to register-inputs Fixed clock frequency / period  Performance 1 instruction per cycle Slowest instruction determines clock frequency  Outstanding issue: MemWrite timing Assume this signal writes to memory at end of clock cycle 37Introduction to Computer Organization and Architecture

Multi-cycle CPU Goals  Improve performance Break each instruction into smaller steps / multiple cycles  LW instruction  5 cycles  SW instruction  4 cycles  R-type instruction  4 cycles  Branch, Jump  3 cycles Aim for 5x clock frequency  Complex instructions (eg, LW)  5 cycles  same performance as before  Simple instructions (eg, ADD)  fewer cycles  faster  Save resources (gates/transistors) Re-use ALU over multiple cycles Put INSTR + DATA in same memory  MemWrite timing solved? 38Introduction to Computer Organization and Architecture

Multi-cycle CPU Datapath Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] Instruction Register Memory Data Register ALU Out A B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data  Add multiplexers + control signals ( IorD, MemtoReg, ALUSrcA, ALUSrcB)  Move signal paths (+4, Shift Left 2) 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x

Multi-cycle CPU Datapath Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] ALU Out A B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data  Add registers + control signals (IR, MDR, A, B, ALUOut) Registers with no control signal load value every clock cycle (eg, PC) 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register

Instruction Execution Example  Execute a “Load Word” instruction LW rt, 0(rs)  5 Steps 1. Fetch instruction 2. Read registers 3. Compute address 4. Read data 5. Write registers 41Introduction to Computer Organization and Architecture

Load Word Instruction Sequence 1. Fetch Instruction InstructionRegister ← Mem[PC] Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction[5:0] Instr[15:0] ALU Out A B Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [15:0] Memory MemData Address

Load Word Instruction Sequence 2. Read Registers A ← Registers[Rs] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] ALU Out A B Memory MemData Address Write data Registers RdData2 RdReg2 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [25:21] RdData1 RdReg1

Load Word Instruction Sequence 3. Compute Address ALUOut ← A + {SignExt(Imm16),b’00’} Instruction [25:21] Instruction [20:16] Instruction [15:0] Instruction[5:0] Instr[15:0] B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [15:11] ALU Out A

Load Word Instruction Sequence 4. Read Data MDR ← Memory[ALUOut] Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] A B Write data Registers RdData1 RdData2 RdReg2 RdReg1 Write reg Write data 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register ALU Out Memory MemData Address

Load Word Instruction Sequence 5. Write Registers Registers[Rt] ← MDR Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] Instruction[5:0] Instr[15:0] ALU Out A B Memory MemData Address Write data Registers RdData1 RdData2 RdReg2 RdReg1 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Write reg Write data

Load Word Instruction Sequence All 5 Steps Shown Instruction[5:0] Instr[15:0] B Write data Registers RdData2 RdReg2 4 Shift Left 2 Sign Extend PC M u x M u x ALU ALU result Zero M u x M u x M u x Instruction Register Memory Data Register Instruction [25:21] Instruction [20:16] Instruction [15:11] Instruction [15:0] ALU Out Memory MemData Address RdData1 RdReg1 Write reg Write data A

Multi-cycle Load Word: Recap 1. Fetch Instruction InstructionRegister ← Mem[PC] 2. Read Registers A ← Registers[Rs] 3. Compute Address ALUOut ← A + {SignExt(Imm16)} 4. Read Data MDR ← Memory[ALUOut] 5. Write Registers Registers[Rt] ← MDR  Missing Steps? 48Introduction to Computer Organization and Architecture

Multi-cycle Load Word: Recap 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC Read Registers A ← Registers[Rs] 3. Compute Address ALUOut ← A + {SignExt(Imm16)} 4. Read Data MDR ← Memory[ALUOut] 5. Write Registers Registers[Rt] ← MDR  Missing Steps? Must increment the PC Do it as part of the instruction fetch (in step 1) Need PCWrite control signal 49Introduction to Computer Organization and Architecture

Multi-cycle R-Type Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC Read Registers A ← Registers[Rs];B ← Registers[Rt] 3. Compute Value ALUOut ← A op B 4. Write Registers Registers[Rd] ← ALUOut  RTL describes data flow action in each clock cycle Control signals determine precise data flow Each step implies unique control values 50Introduction to Computer Organization and Architecture

Multi-cycle R-Type Instruction: Control Signal Values 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC + 4 MemRead=1, ALUSrcA=0, IorD=0, IRWrite, ALUSrcB=01, ALUop=00, PCWrite, PCSource=00 2. Read Registers A ← Registers[Rs];B ← Registers[Rt] ALUSrcA=0, ALUSrcB=11, ALUop=00 3. Compute Value ALUOut ← A op B ALUSrcA=1, ALUSrcB=00, ALUop=10 4. Write Registers Registers[Rd] ← ALUOut RegDst=1, RegWrite, MemtoReg=0  Each step implies unique control values Fixed for entire cycle “Default value” implied if unspecified 51Introduction to Computer Organization and Architecture

Check Your Work – Is RTL Valid ? 1. Datapath check Within one cycle…  Each cycle has valid data flow path (path exists)  Each register gets only one new value Across multiple cycles…  Register value is defined before use in previous (earlier in time) clock cycle  Eg, “A  3” must occur before “B  A”  Make sure register value doesn’t disappear if set >1 cycle earlier 2. Control signal check Each cycle, RTL describing the datapath flow implies a value for each control signal  0 or 1 or default or don’t care Each control signal gets only one fixed value the entire cycle 3. Overall check Does the sequence of steps work ? 52Introduction to Computer Organization and Architecture

Multi-cycle BEQ Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC];PC ← PC Read Registers, Precompute Target A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’} 3. Compare Registers, Conditional Branch if( (A – B) ==0 ) PC ← ALUOut Green shows PC calculation flow (in parallel with other operations) 53Introduction to Computer Organization and Architecture

Multi-cycle Datapath with Control Signals Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0] PCWrite IorD MemRead MemWrite MemtoReg IRWrite PCSrc ALUOp ALUSrcA ALUSrcB RegWrite RegDst ALU Control 54Introduction to Computer Organization and Architecture

Multi-cycle Datapath with Controller Instr. [31:26] Instr[31:26] Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0]

Multi-cycle BEQ Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC Read Registers, Precompute Target A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’} 3. Compare Registers, Conditional Branch if( (A – B) ==0 ) PC ← ALUOut Green shows PC calculation flow (in parallel with other operations) 56Introduction to Computer Organization and Architecture

Multi-cycle Datapath with Control Signals Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0] PCWrite IorD MemRead MemWrite MemtoReg IRWrite PCSrc ALUOp ALUSrcA ALUSrcB RegWrite RegDst ALU Control 57Introduction to Computer Organization and Architecture

Multi-cycle Datapath with Controller Instr. [31:26] Instr[31:26] Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0]

Multi-cycle CPU Control: Overview  General approach: Finite State Machine (FSM) Need details in each branch of control…  Precise outputs for each state (Mealy depends on inputs, Moore does not)  Precise “next state” for each state (can depend on inputs) Control Signal Outputs Control Signal Outputs 59Introduction to Computer Organization and Architecture

How to Implement FSM ?  Manually with logic gates + FFs Bubble diagram, next-state table, state assignment Karnaugh map for each state bit, each output bit (painful!)  High-level language description (eg, Verilog, VHDL) Describe FSM bubble diagram (next-states, output values) Automatically synthesized into gates + FFs  Microcode (µ-code) description Sequence through many µ-ops for each CPU instruction  One µ-op (µ-instruction) sends correct control signal for 1 cycle  µ-op similar to one bubble in FSM Acts like a mini-CPU within a CPU  µPC: microcode program counter  Microcode storage memory contains µ-ops Can look similar to RTL or some new “assembly language” 60Introduction to Computer Organization and Architecture

FSM Specification: Bubble Diagram Can build this by examining RTL It is possible to automatically convert RTL into this form ! 61

FSM: Gates + FFs Implementation FSM High-level Organization 62Introduction to Computer Organization and Architecture

FSM: Microcode Implementation Adder 1 Datapath control outputs Sequencing control Inputs from instruction register opcode field Microcode Storage (memory) Inputs Outputs Microprogram Counter Address Select Logic 63Introduction to Computer Organization and Architecture

Multi-cycle CPU with Control FSM Instr. [31:26] Instr[31:26] Instr[25:21] Instr[20:16] Instr[15:0] Instruction[5:0] In[15:11] Instr[25:0] PC[31..28] Jump address [31..0] FSM Control Outputs Conditional Branch

Control FSM: Overview  General approach: Finite State Machine (FSM)  Need details in each branch of control… 65Introduction to Computer Organization and Architecture

Detailed FSM 66

Detailed FSM 67

Detailed FSM: Instruction Fetch 68Introduction to Computer Organization and Architecture

Detailed FSM: Memory Reference LW SW 69

Detailed FSM: R-Type Instruction 70Introduction to Computer Organization and Architecture

Detailed FSM: Branch Instruction 71Introduction to Computer Organization and Architecture

Detailed FSM: Jump Instruction 72Introduction to Computer Organization and Architecture

Performance Comparison Single-cycle CPU vs Multi-cycle CPU 73Introduction to Computer Organization and Architecture

Simple Comparison Single-cycle CPU 1 clock cycle 5 clock cycles Multi-cycle CPU 4 clock cycles Multi-cycle CPU 3 clock cycles Multi-cycle CPU SW, R-type BEQ, J LW All

What’s really happening? Single-cycle CPU Multi-cycle CPU ( Load Word Instruction ) FetchDecodeMemoryWrite Calc Addr Ideally: 75Introduction to Computer Organization and Architecture

In practice, steps differ in speeds… Single-cycle CPU Multi-cycle CPU FetchDecodeMemory Calc Addr FetchDecodeMemory Calc Addr Write Violation! Wasted time! Load Word Instruction 76Introduction to Computer Organization and Architecture

Single-cycle vs Multi-cycle LW instruction faster for single-cycle Single-cycle CPU FetchDecodeMemory Calc Addr FetchDecodeMemory Calc Addr Write Violation fixed! Multi-cycle CPU Now wasted time is larger! 77Introduction to Computer Organization and Architecture

Single-cycle vs Multi-cycle SW instruction ~ same speed Single-cycle CPU FetchDecodeMemory Calc Addr FetchDecodeMemory Calc Addr Multi-cycle CPU Wasted time! Speed diff 78Introduction to Computer Organization and Architecture

Single-cycle vs Multi-cycle BEQ, J instruction faster for multi-cycle Single-cycle CPU FetchDecode Calc Addr FetchDecode Calc Addr Wasted time! Speed diff Multi-cycle CPU 79Introduction to Computer Organization and Architecture

Performance Summary  Which CPU implementation is faster? LW  single-cycle is faster SW,R-type  about the same BEQ,J  multi-cycle is faster  Real programs use a mix of these instructions  Overall performance depends instruction frequency ! 80Introduction to Computer Organization and Architecture

Implementation Summary  Single-cycle CPU 1 instruction per cycle (eg, 1MHz  1 MIPS) No “wasted time” on most complex instruction Large wasted time on simpler instructions Simple controller (just a lookup table or memory) Simple instructions  Multi-cycle CPU << 1 instruction per cycle (eg, 1MHz  0.2 MIPS) Small time wasted on most complex instruction  Hence, this instruction always slower than single-cycle CPU Small time wasted on simple instructions  Eliminates “large wasted time” by using fewer clock cycles Complex controller (FSM) Potential to create complex instructions 81Introduction to Computer Organization and Architecture

The End Lecture 11