Processor: Datapath and Control (part 2)

Processor: Datapath and Control (part 2)
Computer Organization Ellen Walker Hiram College Figures from Computer Organization and Design 3ed, D.A. Patterson & J.L. Hennessey, Morgan Kauffman © 2005 unless otherwise specified

A Multicycle Implementation
Faster cycle time (but more cycles) Some instructions use fewer cycles than others Reuse of hardware functional units Memory ALU Additional registers For intermediate results

Overview Multicycle Datapath
New registers: PC, A, B, IR, MDR, ALUOut Simplified hardware: 1 memory, 1 ALU (no extra adders)

Detailed Datapath (No Branch)
This version shows Multiplexors and additional hardware units (from before) + inst. bits Note MUX for both parts of ALU now: A+B, PC+4, Reg+Imm16, PC+shift Imm26 MUX for Address: PC (if instruction) vs. ALUOut (if memory reference) Trace through Add and Load instruction. Add cycles: 1) mem[pc]->IR and PC+4->PC; 2) rs->A, rt->B, 3) A+B -> ALUOut 4) ALUOut->rd Load cycles: 1) mem[pc]->IR and PC+4->PC, 2) Rs->A; 3) A+sign-ext(ir[15-0])->ALUOut 4) MEM[ALUOut]->MDR, 6) MDR->rd

Datapath with Control Lines (Except Branching)
Most of these we’ve seen before. New: IRWrite (when is instruction written?), RegWrite,

Datapath, Control & Branch Logic
New: Control Logic to compute all the control bits; MUX for PC (direct ALU result (PC+4), ALUOut (beq), PC+ SL2(Imm26) (jump) 2 PCwrite signals (one for unconditional, 1 for conditional)

Actions for R-type Instruction
IR <= Memory[PC] PC<= PC+4 A <= Reg[IR[25:21]] B<= Reg[IR[20:16]] ALUOut <= A op B Reg[IR[15:11]] <= ALUOut

Actions for Load IR <= Memory[PC] PC<= PC+4
A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] ALUOut <= PC+SignExt(IR[15:0]) MDR <= Mem[ALUOut] Reg[IR[20:16]]<= MDR 3 Could be in the same step as 2 because it uses no shared data or logic. But, we’ll see later that it’s convenient to use the ALU in step 2 for another instruction.

Actions for Store IR <= Memory[PC] PC<= PC+4
A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] ALUOut <= PC+SignExt(IR[15:0]) Mem[ALUOut]<= B

Actions for Branch (beq)
IR <= Memory[PC] PC<= PC+4 A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] ALUOut <=PC+(SignExt(IR[15:0])<<2) If (A==B) PC <= ALUOut

Actions for Jump IR <= Memory[PC] PC<= PC+4
PC <= {PC[31:28], IR[25:0], “00”}

All Instructions Together
Step 1: Fetch (all instructions) Step 2: “Decode” (all instructions) Compute A, B and Branch address (just in case it’s needed) Step 3: Computation Compute ALU op or Mem address, or check equality for BEQ Step 4: Memory Access or Register write Step 5: Load completion Jump takes 3 cycles (cycle 2 does useless work) R-type takes 4 cycles Store takes 4 cycles Load takes 5 cycles (fig. 5.30)

Computing Control Signals
Control signals now depend on “step” as well as on the instruction being executed States in the state machine correspond to steps of instructions

State Transitions for Mealy Machine
Other Fetch Decode Compute Mem/R write Load Load Compl. Beq or j Other Mealy machines are more complex with regard to control signal generation, and we can trade off more states (extra bits in the state register) for much simpler logic… Each step is a state (simple next state logic) Control signal depends on state AND instruction

State Transitions (Overview) for Moore Machine
R-type4 R-type R-type3 Load 5 Load 4 Load Fetch Decode Load/ store 3 Lw, sw Store 4 Store More dependence on instruction for transition Different states for different control signals One state per control step per instruction (combined if actions are identical) Beq 3 beq Jump 3 j

Fetch and Decode States
Open book to Page 323 (Figure 5.28) to follow along Fetch: ALUSrcA = PC, ALUSrc B= const. 4, ALUOp = add, PCSource =ALU Decode: ALUSrcA = PC, ALUSrcB= imm16 sign-extend & shift, ALUOp=add (for beq). Note both A and B write on every cycle, so we don’t have signals for them here.

Read and Write States State 2: ALUSrcA = A, ALUSrcB = signext(imm16), ALUOp = add State 3: IorD = 1 (ALUOut -> memory address) State 4: MemtoReg = 1 (memory connected to reg. data), RegDst= 2nd regi. Field written to State 4: same as state 3 except transitions

States for R-Type Instructions
6: ALUSrcA = A, ALUSRCB = B, ALUOp = 10 (R-type) 7: RegDst = 3rd inst. Register field, MemtoReg = AluOut

States for Branch and Jump
Branch: ALUSrcA = A, ALUSrcB = B, ALUOp = branch equal, PCWriteCond (write PC if zero), PCSrc = ALUOut Jump: PCWrite (not gated), PCSrc =pc[32:28];imm26;’00’) Branch Jump

Complete Diagram (9 States)

Control Signal Computations
Logic State Register Control signals Two sets of combinational logic: state -> control signals and state, inst -> next state Next State Logic Instruction Register

Combined Block Diagram (C.3.2)
9 states means we need 4 bits for the state Only 6 bits of the IR (opcode) needed for state logic

Determining Control Signals
For each control signal, determine conditions where the signal is 1 rather than 0 PCWrite is 1 in states 1 and 9 (only) ALUSrcA is 1 in states 2, 6 and 8 ALUSrcB1 is 1 in states 1 and 2 ALUSrcB0 is 1 in states 0 and 1 (etc) Remember, we’ve designed the machine so that control signals depend ONLY on state

Inputs for All “1” Outputs
Simplification: we’re using complete state numbers; also we’re not encoding states yet. BTW, signals that have the same equation can be combined: PCWriteCond = PCSource0 = ALUOp0 (state 8)

PCWrite = State1+State9 S3 S2 S1 S0 P 1 S3 S2 S1 S0 P 1
1 S3 S2 S1 S0 P 1 This needs to be done for each signal. Or you can simply look at the encodings = x001 so ~s2+~s1+s0

Determining Next State Outputs
Same general idea as control outputs Include opcode bits from the instruction into truth table Consider each bit of state output separately. For example, bit 0 of state is true for every odd state Book uses internal NextState0 - NextState9 terms, though these are not necessary

NextState2 = State1&(op=‘lw’|op=‘sw’)
NS2 1 [any other] Nextstate2 = state0 & op0&op1&~op2&~op4&op5

Alternatives for Control and Next-State Logic
Implement Boolean equations directly with gates This is what we’ve been doing Implement “sparse” truth table as 2-level PLA OR of AND’s; only rows with 1 output Implement complete truth table as ROM

Truth Tables for Control Signals
Only rows with “1” for the signal are shown here.

Truth Tables for Next State
One table per bit.

Truth Table in ROM ROM height = number of entries of truth table
ROM width = number of outputs Encoding Address of ROM cell is set of input bits Contents of ROM cell is set of output bits If equation has “don’t cares”, rows will be duplicated. Every possible combination of inputs needs an output, even if that combination makes no sense. For state transitions, usually set next state of all non-existent states to the start state

TT in ROM: Example 4 ROM cells compute AND, OR, XOR Address Contents
00 000 01 011 10 11 110

ROM for Our State Machine
10 bits of address = 1024 words 4 bits of current state 6 bits of opcode Each word is 20 bits 16 data path control bits 4 next state bits Each combination of datapath bits is duplicated 2^6= 64 times Opcode 6 bits are don’t cares Encode all combinations Datapath control bits could be reduced to 14 given 3 identical (state8) bits.

Control Bits in ROM

Next State Bits in ROM Typically, set illegal or don’t cares to 0000 (to restart machine in case of bad bit)

Separate Control and Next State ROMS
Control signal: 16x16 bits Saved space: 1008x16 bits Next state: 1024x4 bits Separate ROMS for control signal computation, Next state computation

Programmed Logic Array
Truth table is very sparse Most input combinations yield 0 outputs ROM implements every combination PLA implements OR of AND (minterms) Repeated minterms included only once Size = (inputs*minterms)+(minterms*outputs) “cells” In this case: 20*17+17*10 Cell is slightly larger than 1-bit memory, but still compare 510 with (4416)! Note, we can do better than 510 by splitting again! 4*10+10*16, 20*7+7*4 ( = 368)

PLA Implementation

Next State Sequences Often, counter can generate next state
More sequencing in more complex instructions, e.g. floating point computation State Next 1 5 2, 6, 8, 9 6 7 2 3, 5 3 4 8 9 5 transitions from counter, 9 not from counter

Control Unit With Sequencer
Address select logic allows “branches” where opcode instead of adder gives state PLA or ROM must compute one more signal: AddrCtl

Selecting Next State Controller specifies to use counter or external table to determine next state AddrCtl = 0 Set state to 0 AddrCtl = 1 Dispatch with ROM 1 AddrCtl = 2 Dispatch with ROM 2 AddrCtl = 3 Use incremented states Dispatch ROMs controlled by opcode (only) Special case for 0 because it’s so common

Dispatch ROMS ROM1: ROM2: OP Val 000000 0110 000010 1001 000100 1000
ROM1 Logic for AddrCtl is easy now: 0: states 4,5,7,8, 9 ; 1: state 1; 2: state 2; 3: states 3,6 ROM2: OP Val ROM2

Address Select Logic PLA could replace either or both dispatch ROMS

Reducing Logic Further
Logic minimization Reduce gates using Karnaugh Maps Reduce states using Finite State Machine minimization (not covered here) Improved state assignment Pick state numbers so that control bits map to as few bits of state as possible (e.g. RegWrite in 4, 7 vs. 8,9)

Processor: Datapath and Control (part 2)

Similar presentations

Presentation on theme: "Processor: Datapath and Control (part 2)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Processor: Datapath and Control (part 2)

Similar presentations

Presentation on theme: "Processor: Datapath and Control (part 2)"— Presentation transcript:

Similar presentations

About project

Feedback