Processor: Datapath and Control (part 2)

Slides:

Advertisements

Similar presentations

The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04.

Advertisements

Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.

EECE476 Lecture 9: Multi-cycle CPU Datapath Chapter 5: Section 5.5 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

CSE378 Multicycle impl,.1 Drawbacks of single cycle implementation All instructions take the same time although –some instructions are longer than others;

1 5.5 A Multicycle Implementation A single memory unit is used for both instructions and data. There is a single ALU, rather than an ALU and two adders.

©UCB CS 161Computer Architecture Chapter 5 Lecture 11 Instructor: L.N. Bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson)

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

331 W10.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 10 Building a Multi-Cycle Datapath [Adapted from Dave Patterson’s.

Class 9.1 Computer Architecture - HUJI Computer Architecture Class 9 Microprogramming.

1. 2 Multicycle Datapath  As an added bonus, we can eliminate some of the extra hardware from the single-cycle datapath. —We will restrict ourselves.

Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections

Datapath and Control: MultiCycle Implementation. Performance of Single Cycle Machines °Assume following operation times: Memory units : 200 ps ALU and.

1 Computer Organization & Design Microcode for Control Sec. 5.7 (CDROM) Appendix C (CDROM) / / pdf / lec_3a_notes.pdf.

CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

1 CS/COE0447 Computer Organization & Assembly Language Multi-Cycle Execution.

C HAPTER 5 T HE PROCESSOR : D ATAPATH AND C ONTROL M ULTICYCLE D ESIGN.

LECTURE 6 Multi-Cycle Datapath and Control. SINGLE-CYCLE IMPLEMENTATION As we’ve seen, single-cycle implementation, although easy to implement, could.

ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 3 In-Class Exercises.

Design a MIPS Processor (II)

Multi-Cycle Datapath and Control

Chapter 5: A Multi-Cycle CPU.

CS161 – Design and Architecture of Computer Systems

Control & Execution Finite State Machines for Control MIPS Execution.

IT 251 Computer Organization and Architecture

/ Computer Architecture and Design

ECE/CS 552: Multicycle Data Path

Systems Architecture I

Multi-Cycle CPU.

Single Cycle Processor

D.4 Finite State Diagram for the Multi-cycle processor

Multi-Cycle CPU.

CS/COE0447 Computer Organization & Assembly Language

Multiple Cycle Implementation of MIPS-Lite CPU

Control & Execution Finite State Machines for Control MIPS Execution.

CS/COE0447 Computer Organization & Assembly Language

Processor: Finite State Machine & Microprogramming

Single-Cycle CPU DataPath.

Chapter Five The Processor: Datapath and Control

Appendix D Mapping Control to Hardware

Multicycle Approach Break up the instructions into steps

The Multicycle Implementation

CS/COE0447 Computer Organization & Assembly Language

Computer Organization Ellen Walker Hiram College

Chapter Five The Processor: Datapath and Control

Drawbacks of single cycle implementation

Topic 5: Processor Architecture Implementation Methodology

The Multicycle Implementation

Systems Architecture I

Vishwani D. Agrawal James J. Danaher Professor

Topic 5: Processor Architecture

COSC 2021: Computer Organization Instructor: Dr. Amir Asif

COSC 2021: Computer Organization Instructor: Dr. Amir Asif

Processor: Multi-Cycle Datapath & Control

CS/COE0447 Computer Organization & Assembly Language

Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Chapter Four The Processor: Datapath and Control

Control Unit for Multiple Cycle Implementation

5.5 A Multicycle Implementation

Processor Design Datapath and Design.

Systems Architecture I

Control Unit for Multiple Cycle Implementation

FloorPlan for Multicycle MIPS

COMS 361 Computer Organization

CS161 – Design and Architecture of Computer Systems

CS/COE0447 Computer Organization & Assembly Language

Presentation transcript:

Processor: Datapath and Control (part 2) Computer Organization Ellen Walker Hiram College Figures from Computer Organization and Design 3ed, D.A. Patterson & J.L. Hennessey, Morgan Kauffman © 2005 unless otherwise specified

A Multicycle Implementation Faster cycle time (but more cycles) Some instructions use fewer cycles than others Reuse of hardware functional units Memory ALU Additional registers For intermediate results

Overview Multicycle Datapath New registers: PC, A, B, IR, MDR, ALUOut Simplified hardware: 1 memory, 1 ALU (no extra adders)

Detailed Datapath (No Branch) This version shows Multiplexors and additional hardware units (from before) + inst. bits Note MUX for both parts of ALU now: A+B, PC+4, Reg+Imm16, PC+shift Imm26 MUX for Address: PC (if instruction) vs. ALUOut (if memory reference) Trace through Add and Load instruction. Add cycles: 1) mem[pc]->IR and PC+4->PC; 2) rs->A, rt->B, 3) A+B -> ALUOut 4) ALUOut->rd Load cycles: 1) mem[pc]->IR and PC+4->PC, 2) Rs->A; 3) A+sign-ext(ir[15-0])->ALUOut 4) MEM[ALUOut]->MDR, 6) MDR->rd

Datapath with Control Lines (Except Branching) Most of these we’ve seen before. New: IRWrite (when is instruction written?), RegWrite,

Datapath, Control & Branch Logic New: Control Logic to compute all the control bits; MUX for PC (direct ALU result (PC+4), ALUOut (beq), PC+ SL2(Imm26) (jump) 2 PCwrite signals (one for unconditional, 1 for conditional)

Actions for R-type Instruction IR <= Memory[PC] PC<= PC+4 A <= Reg[IR[25:21]] B<= Reg[IR[20:16]] ALUOut <= A op B Reg[IR[15:11]] <= ALUOut

Actions for Load IR <= Memory[PC] PC<= PC+4 A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] ALUOut <= PC+SignExt(IR[15:0]) MDR <= Mem[ALUOut] Reg[IR[20:16]]<= MDR 3 Could be in the same step as 2 because it uses no shared data or logic. But, we’ll see later that it’s convenient to use the ALU in step 2 for another instruction.

Actions for Store IR <= Memory[PC] PC<= PC+4 A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] ALUOut <= PC+SignExt(IR[15:0]) Mem[ALUOut]<= B

Actions for Branch (beq) IR <= Memory[PC] PC<= PC+4 A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] ALUOut <=PC+(SignExt(IR[15:0])<<2) If (A==B) PC <= ALUOut

Actions for Jump IR <= Memory[PC] PC<= PC+4 PC <= {PC[31:28], IR[25:0], “00”}

All Instructions Together Step 1: Fetch (all instructions) Step 2: “Decode” (all instructions) Compute A, B and Branch address (just in case it’s needed) Step 3: Computation Compute ALU op or Mem address, or check equality for BEQ Step 4: Memory Access or Register write Step 5: Load completion Jump takes 3 cycles (cycle 2 does useless work) R-type takes 4 cycles Store takes 4 cycles Load takes 5 cycles (fig. 5.30)

Computing Control Signals Control signals now depend on “step” as well as on the instruction being executed States in the state machine correspond to steps of instructions

State Transitions for Mealy Machine Other Fetch Decode Compute Mem/R write Load Load Compl. Beq or j Other Mealy machines are more complex with regard to control signal generation, and we can trade off more states (extra bits in the state register) for much simpler logic… Each step is a state (simple next state logic) Control signal depends on state AND instruction

State Transitions (Overview) for Moore Machine R-type4 R-type R-type3 Load 5 Load 4 Load Fetch Decode Load/ store 3 Lw, sw Store 4 Store More dependence on instruction for transition Different states for different control signals One state per control step per instruction (combined if actions are identical) Beq 3 beq Jump 3 j

Fetch and Decode States Open book to Page 323 (Figure 5.28) to follow along Fetch: ALUSrcA = PC, ALUSrc B= const. 4, ALUOp = add, PCSource =ALU Decode: ALUSrcA = PC, ALUSrcB= imm16 sign-extend & shift, ALUOp=add (for beq). Note both A and B write on every cycle, so we don’t have signals for them here.

Read and Write States State 2: ALUSrcA = A, ALUSrcB = signext(imm16), ALUOp = add State 3: IorD = 1 (ALUOut -> memory address) State 4: MemtoReg = 1 (memory connected to reg. data), RegDst= 2nd regi. Field written to State 4: same as state 3 except transitions

States for R-Type Instructions 6: ALUSrcA = A, ALUSRCB = B, ALUOp = 10 (R-type) 7: RegDst = 3rd inst. Register field, MemtoReg = AluOut

States for Branch and Jump Branch: ALUSrcA = A, ALUSrcB = B, ALUOp = branch equal, PCWriteCond (write PC if zero), PCSrc = ALUOut Jump: PCWrite (not gated), PCSrc =pc[32:28];imm26;’00’) Branch Jump

Complete Diagram (9 States)

Control Signal Computations Logic State Register Control signals Two sets of combinational logic: state -> control signals and state, inst -> next state Next State Logic Instruction Register

Combined Block Diagram (C.3.2) 9 states means we need 4 bits for the state Only 6 bits of the IR (opcode) needed for state logic

Determining Control Signals For each control signal, determine conditions where the signal is 1 rather than 0 PCWrite is 1 in states 1 and 9 (only) ALUSrcA is 1 in states 2, 6 and 8 ALUSrcB1 is 1 in states 1 and 2 ALUSrcB0 is 1 in states 0 and 1 (etc) Remember, we’ve designed the machine so that control signals depend ONLY on state

Inputs for All “1” Outputs Simplification: we’re using complete state numbers; also we’re not encoding states yet. BTW, signals that have the same equation can be combined: PCWriteCond = PCSource0 = ALUOp0 (state 8)

PCWrite = State1+State9 S3 S2 S1 S0 P 1 S3 S2 S1 S0 P 1 1 S3 S2 S1 S0 P 1 This needs to be done for each signal. Or you can simply look at the encodings 1001+0001 = x001 so ~s2+~s1+s0

Determining Next State Outputs Same general idea as control outputs Include opcode bits from the instruction into truth table Consider each bit of state output separately. For example, bit 0 of state is true for every odd state Book uses internal NextState0 - NextState9 terms, though these are not necessary

NextState2 = State1&(op=‘lw’|op=‘sw’) NS2 0 0 0 1 1 0 0 0 1 1 1 1 0 1 0 1 1 [any other] Nextstate2 = state0 & op0&op1&~op2&~op4&op5

Alternatives for Control and Next-State Logic Implement Boolean equations directly with gates This is what we’ve been doing Implement “sparse” truth table as 2-level PLA OR of AND’s; only rows with 1 output Implement complete truth table as ROM

Truth Tables for Control Signals Only rows with “1” for the signal are shown here.

Truth Tables for Next State One table per bit.

Truth Table in ROM ROM height = number of entries of truth table ROM width = number of outputs Encoding Address of ROM cell is set of input bits Contents of ROM cell is set of output bits If equation has “don’t cares”, rows will be duplicated. Every possible combination of inputs needs an output, even if that combination makes no sense. For state transitions, usually set next state of all non-existent states to the start state

TT in ROM: Example 4 ROM cells compute AND, OR, XOR Address Contents 00 000 01 011 10 11 110

ROM for Our State Machine 10 bits of address = 1024 words 4 bits of current state 6 bits of opcode Each word is 20 bits 16 data path control bits 4 next state bits Each combination of datapath bits is duplicated 2^6= 64 times Opcode 6 bits are don’t cares Encode all combinations Datapath control bits could be reduced to 14 given 3 identical (state8) bits.

Control Bits in ROM

Next State Bits in ROM Typically, set illegal or don’t cares to 0000 (to restart machine in case of bad bit)

Separate Control and Next State ROMS Control signal: 16x16 bits Saved space: 1008x16 bits Next state: 1024x4 bits Separate ROMS for control signal computation, Next state computation

Programmed Logic Array Truth table is very sparse Most input combinations yield 0 outputs ROM implements every combination PLA implements OR of AND (minterms) Repeated minterms included only once Size = (inputs*minterms)+(minterms*outputs) “cells” In this case: 20*17+17*10 Cell is slightly larger than 1-bit memory, but still compare 510 with 4096+320 (4416)! Note, we can do better than 510 by splitting again! 4*10+10*16, 20*7+7*4 (200+168 = 368)

PLA Implementation

Next State Sequences Often, counter can generate next state More sequencing in more complex instructions, e.g. floating point computation State Next 1 5 2, 6, 8, 9 6 7 2 3, 5 3 4 8 9 5 transitions from counter, 9 not from counter

Control Unit With Sequencer Address select logic allows “branches” where opcode instead of adder gives state PLA or ROM must compute one more signal: AddrCtl

Selecting Next State Controller specifies to use counter or external table to determine next state AddrCtl = 0 Set state to 0 AddrCtl = 1 Dispatch with ROM 1 AddrCtl = 2 Dispatch with ROM 2 AddrCtl = 3 Use incremented states Dispatch ROMs controlled by opcode (only) Special case for 0 because it’s so common

Dispatch ROMS ROM1: ROM2: OP Val 000000 0110 000010 1001 000100 1000 000000 0110 000010 1001 000100 1000 100011 0010 101011 0010 ROM1 Logic for AddrCtl is easy now: 0: states 4,5,7,8, 9 ; 1: state 1; 2: state 2; 3: states 3,6 ROM2: OP Val 100011 0011 101011 0101 ROM2

Address Select Logic PLA could replace either or both dispatch ROMS

Reducing Logic Further Logic minimization Reduce gates using Karnaugh Maps Reduce states using Finite State Machine minimization (not covered here) Improved state assignment Pick state numbers so that control bits map to as few bits of state as possible (e.g. RegWrite in 4, 7 vs. 8,9)