Download presentation
Presentation is loading. Please wait.
Published byJunior Bryan Modified over 9 years ago
1
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette.edu ECE 313 - Computer Organization Lecture 11 - Processor Implementation: Overview, Single-Cycle Design Fall 2004 Reading: 5.1-5.4 Homework Due 10/27: 4.1, 4.2, 4.3, 4.6, 4.19 - 4.22 Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s 18-347 Slides - Fall 1999 CMU other sources as noted
2
ECE 313 Fall 2004Lecture 11 - Processor Design2 Roadmap for the Term: Major Topics Computer Systems Overview Technology Trends Performance Instruction Sets (and Software) Logic and Arithmetic Processor Implementation Memory Systems Input/Output
3
ECE 313 Fall 2004Lecture 11 - Processor Design3 Outline - Processor Implementation Overview Review of Processor Operation Steps in Processor Design Implementation Styles The “MIPS Lite” Instruction Subset Single-Cycle Implementation Multi-Cycle Implementation Pipelined Implementation
4
ECE 313 Fall 2004Lecture 11 - Processor Design4 Review: The “Five Classic Components” Processor Datapath Control Memory Input Output Input Processor Control Datapath Output Memory 1001010010110000 0010100101010001 1111011101100110 1001010010110000
5
ECE 313 Fall 2004Lecture 11 - Processor Design5 Review: Processor Operation Executing Programs - the “fetch/execute” cycle Processor fetches instruction from memory Processor executes “machine language” instruction Perform calculation Read/write data Repeat with “next” instruction Processor Control Datapath 1001010010110000 0010100101010001 1111011101100110 1001010010110000 Memory 1111011101100110 1001010010110000 PC Address Instruction
6
ECE 313 Fall 2004Lecture 11 - Processor Design6 Processor Design Goals Design hardware that: Fetches instructions from memory Executes instructions as specified by ISA Design considerations Cost Speed Power
7
ECE 313 Fall 2004Lecture 11 - Processor Design7 Steps in Processor Design 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals
8
ECE 313 Fall 2004Lecture 11 - Processor Design8 Processor Implementation Styles Single Cycle Perform each instruction in 1 clock cycle Disadvantage: only as fast as “slowest” instruction Multi-Cycle Break fetch/execute cycle into multple steps Perform 1 step in each clock cycle Pipelined Execute each instruction in multiple steps Perform 1 step / instruction in each clock cycle Process multiple instructions in parallel - “assembly line”
9
ECE 313 Fall 2004Lecture 11 - Processor Design9 “MIPS Lite” - A Pedagogical Example Use a MIPS to illustrate processor design Limit initial design to a subset of instructions: Memory access: lw, sw Arithmetic/Logical: add, sub, and, or, slt Branch/Jump: beq, j Add instructions as we go along (e.g., addi )
10
ECE 313 Fall 2004Lecture 11 - Processor Design10 Review - MIPS Instruction Formats Field definitions: op: instruction opcode rs, rt, rd: source (2) and destination (1) register numbers shamt: shift amount funct: function code (works with opcode to specify op) offset/immediate: address offset or immediate value address: target address for jumps oprsrtoffset 6 bits5 bits 16 bits oprsrtrdfunctshamt 6 bits5 bits 6 bits R-Format I-Format opaddress 6 bits26 bits J-Format
11
ECE 313 Fall 2004Lecture 11 - Processor Design11 MIPS Instruction Subset Arithmetic & Logical Instructions add $s0, $s1, $s2 sub $s0, $s1, $s2 and $s0, $s1, $s2 or $s0, $s1, $s2 Data Transfer Instructions lw $s1, offset($s0) sw $s2, offset($s3) Branch beq $s0, offset j address
12
ECE 313 Fall 2004Lecture 11 - Processor Design12 MIPS Instruction Execution General Procedure 1.Fetch Instruction from memory 2.Decode Instruction, read register values 3.If necessary, perform an ALU operation 4.If load or store, do memory access 5.Write results back to register file and increment PC Register Transfers provide a concise description
13
ECE 313 Fall 2004Lecture 11 - Processor Design13 Register Transfers for the MIPS Subset Instruction Fetch Instruction <- MEM[PC] Instruction Execution Instr.Register Transfers add R[rd] <- R[rs] + R[rt];PC <- PC + 4 sub R[rd] <- R[rs] - R[rt];PC <- PC + 4 and R[rd] <- R[rs] & R[rt];PC <- PC + 4 or R[rd] <- R[rs] | R[rt];PC <- PC + 4 lw R[rt] <- MEM[R[rs] + s_extend(offset)]; PC<- PC + 4 sw MEM[R[rs] + sign_extend(offset)] <- R[rt];PC <- PC + 4 beq if (R[rs] == R[rt]) then PC <- PC+4 + s_extend(offset<<2) else PC <- PC + 4 j PC <- upper(PC)@(address << 2)
14
ECE 313 Fall 2004Lecture 11 - Processor Design14 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation
15
ECE 313 Fall 2004Lecture 11 - Processor Design15 1. Instruction Set Requirements Memory Read Instructions Read and Write Data Registers - 32 read (from rs field in instruction) read (from rt field in instruction) write (from rd or rt field in instruction) PC Sign Extender Add and Subtract (register values) Add 4 or extended immediate to PC Review register transfers for details!
16
ECE 313 Fall 2004Lecture 11 - Processor Design16 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation
17
ECE 313 Fall 2004Lecture 11 - Processor Design17 2. (a) Choose Datapath Components Combinational Components Adder ALU Multiplexer Sign Extender Storage Components Registers Register File Memory
18
ECE 313 Fall 2004Lecture 11 - Processor Design18 Datapath Combinational Components NOTES: - Blue-green inputs are control lines - Blue lines often hidden to suppress detail AdderALU Multiplexer Sign Extender
19
ECE 313 Fall 2004Lecture 11 - Processor Design19 Datapath Storage - Registers Registers store multiple bit values New value loaded on clock edge when EN asserted
20
ECE 313 Fall 2004Lecture 11 - Processor Design20 Datapath Storage: Idealized Memory Data Read Place Address on ADDR Assert MemRead Data Available on RD after memory “access time” Data Write Place address on ADDR Place data input on WD Assert MemWrite Data written on clock edge
21
ECE 313 Fall 2004Lecture 11 - Processor Design21 Datapath Storage: Register File Register File - 32 registers (including $zero ) Two data outputs RD1, RD2 Assert register number RN1/RN2 Read output RD1/RD2 after “access time” (propagation delay) One data input WD Assert register number WN Assert value on WD Assert RegWrite Value loaded on clock edge Implemented as a small multiport memory
22
ECE 313 Fall 2004Lecture 11 - Processor Design22 2. (b) Choose Clocking Methodology Clocking methodology defines When signals can be read from storage elements When signals can be written to storage elements Typical clocking methodologies Single-Phase Edge Triggered Single-Phase Level Triggered Multiple-Phase Level Triggered Authors’ choice: Single-Phase Edge Triggered All registers updated on one edge of clock cycle Simplest to work with
23
ECE 313 Fall 2004Lecture 11 - Processor Design23 Review: Edge-Triggered Clocking Controls sequential circuit operation Register outputs change after first clock edge Combinational logic determines “next state” Storage elements store new state on next clock edge Adder Mux Combinational LogicRegister Output Register Input Clock
24
ECE 313 Fall 2004Lecture 11 - Processor Design24 Review: Edge-Triggered Clocking Propagation delay - t prop Logic (including register outputs) Interconnect Register setup time - t setup Clock Adder Mux Combinational LogicRegister Output Register Input t prop t setup t clock > t prop + t setup t clock = t prop + t setup + t slack
25
ECE 313 Fall 2004Lecture 11 - Processor Design25 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation
26
ECE 313 Fall 2004Lecture 11 - Processor Design26 3. Assemble Datapath Tasks processor must implement 1.Fetch Instruction from memory 2.Decode Instruction, read register values 3.If necessary, perform an ALU operation 4.If memory address, perform load/store 5.Write results back to register file and increment PC How can we do this with the datapath hardware?
27
ECE 313 Fall 2004Lecture 11 - Processor Design27 Datapath for Instruction Fetch Instruction <- MEM[PC] PC <- PC + 4
28
ECE 313 Fall 2004Lecture 11 - Processor Design28 Datapath for R-Type Instructions add rd, rs, rt R[rd] <- R[rs] + R[rt];
29
ECE 313 Fall 2004Lecture 11 - Processor Design29 Datapath for Load/Store Instructions lw rt, offset(rs) R[rt] <- MEM[R[rs] + s_extend(offset)];
30
ECE 313 Fall 2004Lecture 11 - Processor Design30 Datapath for Load/Store Instructions sw rt, offset(rs) MEM[R[rs] + sign_extend(offset)] <- R[rt]
31
ECE 313 Fall 2004Lecture 11 - Processor Design31 Datapath for Branch Instructions beq rs, rt, offset if (R[rs] == R[rt]) then PC <- PC+4 + s_extend(offset<<2)
32
ECE 313 Fall 2004Lecture 11 - Processor Design32 Putting it all together… Goal: merge datapaths for each function Instruction Fetch R-Type Instructions Load/Store Instructions Branch instructions Add multiplexers to steer data as needed
33
ECE 313 Fall 2004Lecture 11 - Processor Design33 Example: combine R-Type and Load/Store Datapaths Select an ALU input from either Register File output RD2 (for R-Type) Sign-extender output (for LW/SW) Select Register File input WD1 from either ALU output (for R-Type) Memory output RD (for LW)
34
ECE 313 Fall 2004Lecture 11 - Processor Design34 Combined Datapath: R-Type and Load/Store Instructions
35
ECE 313 Fall 2004Lecture 11 - Processor Design35 Combined Datapath: Executing a R-Type Instruction add rd,rs,rt
36
ECE 313 Fall 2004Lecture 11 - Processor Design36 Combined Datapath: Executing a load instruction lw rt,offset(rs)
37
ECE 313 Fall 2004Lecture 11 - Processor Design37 Combined Datapath: Executing a store instruction sw rt,offset(rs)
38
ECE 313 Fall 2004Lecture 11 - Processor Design38 Complete Single-Cycle Datapath
39
ECE 313 Fall 2004Lecture 11 - Processor Design39 Complete Datapath Executing add add rd, rs, rt
40
ECE 313 Fall 2004Lecture 11 - Processor Design40 Complete Datapath Executing load lw rt,offset(rs)
41
ECE 313 Fall 2004Lecture 11 - Processor Design41 Complete Datapath Executing store sw rt,offset(rs)
42
ECE 313 Fall 2004Lecture 11 - Processor Design42 Complete Datapath Executing branch beq r1,r2,offset
43
ECE 313 Fall 2004Lecture 11 - Processor Design43 Refining the Complete Datapath Depending on the instruction, register file input WN is fed by different fields of the instruction R-Type Instructions: rd field (bits 15:11) Load Instructin: rt field (bits 21:16) Result: need an additional multiplexer on WN input oprsrtoffset 6 bits5 bits 16 bits oprsrtrdfunctshamt 6 bits5 bits 6 bits R-Format I-Format
44
ECE 313 Fall 2004Lecture 11 - Processor Design44 Complete Datapath (Refined)
45
ECE 313 Fall 2004Lecture 11 - Processor Design45 Complete Single-Cycle Datapath Control signals shown in blue
46
ECE 313 Fall 2004Lecture 11 - Processor Design46 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation
47
ECE 313 Fall 2004Lecture 11 - Processor Design47 Control Unit Design Desired function: Given an instruction word…. Generate control signals needed to execute instruction Implemented as a combinational logic function: Inputs Instruction word - op and funct fields ALU status output - Zero Outputs - processor control points ALU control signals Multiplexer control signals Register File & memory control signal
48
ECE 313 Fall 2004Lecture 11 - Processor Design48 Determining Control Points For each instruction type, determine proper value for each control point (control signal) 00 11 X ( don’t care - either 1 or 0 ) Ultimately … use these values to build a truth table
49
ECE 313 Fall 2004Lecture 11 - Processor Design49 Review: ALU Control Signals Functions: Fig B.5.13 (also in Ch. 5 - p. 301) ALU control inputFunction 000AND 001OR 010add 110subtract 111set on less than
50
ECE 313 Fall 2004Lecture 11 - Processor Design50 Control Signals - R-Type Instruction Control signals shown in blue 1 0 0 0 1 ??? Value depends on funct 0 0
51
ECE 313 Fall 2004Lecture 11 - Processor Design51 0 Control Signals - lw Instruction Control signals shown in blue 0 010 1 1 1 0 1
52
ECE 313 Fall 2004Lecture 11 - Processor Design52 0 Control Signals - sw Instruction Control signals shown in blue X 010 1 X 0 1 0
53
ECE 313 Fall 2004Lecture 11 - Processor Design53 Control Signals - beq Instruction Control signals shown in blue X 110 0 X 0 0 0 1 if Zero=1
54
ECE 313 Fall 2004Lecture 11 - Processor Design54 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation
55
ECE 313 Fall 2004Lecture 11 - Processor Design55 Control Unit Structure
56
ECE 313 Fall 2004Lecture 11 - Processor Design56 More notes about Control Unit Structure Control unit as shown: one huge logic block Idea: decompose into smaller logic blocks Smaller blocks can be faster Smaller blocks are easier to work with Observation (rephrased): The only control signal that depends on the funct field is the ALU Operation signal Idea: separate logic for ALU control
57
ECE 313 Fall 2004Lecture 11 - Processor Design57 Modified Control Unit Structure This is called “derived control” or “Local decoding”
58
ECE 313 Fall 2004Lecture 11 - Processor Design58 Datapath with Modified Control Unit
59
ECE 313 Fall 2004Lecture 11 - Processor Design59 Review from Ch. 4: ALU Function Functions: Fig B.5.13 (also in Ch. 5 - p. 301) ALU control inputFunction 000AND 001OR 010add 110subtract 111set on less than
60
ECE 313 Fall 2004Lecture 11 - Processor Design60 ALU Usage in Processor Design Usage depends on instruction type Instruction type (specified by opcode) funct field (r-type instructions only) Encode instruction type in ALUOp signal OperationDesired Action lwadd swadd beqsubtract add subsubtract and or slt and or set on less than ALU Ctl. 010 110 010 110 000 001 111 funct XXXXXX 100000 100010 100100 100101 101010 Instr. type data transfer branch r-type ALUOp 00 01 10 XXXXXX means “don’t care”
61
ECE 313 Fall 2004Lecture 11 - Processor Design61 ALU Control - Truth Table (Fig. 5-13) Use don’t care values to minimize length Ignore F5, F4 (they are always “10”) Assume ALUOp never equals “11” Operation 010 110 010 110 000 001 111 ALUOp1 0 X 1 1 1 1 1 ALUOp0 0 1 X X X X X F5 X X F4 X F3 X 0 0 0 0 1 F2 X 0 0 1 1 0 F1 X 0 1 0 0 1 F0 X 0 0 0 1 0 XXXXX XX XX XX XX XX
62
ECE 313 Fall 2004Lecture 11 - Processor Design62 ALU Control - Implementation Figure C.2.3, page C-6
63
ECE 313 Fall 2004Lecture 11 - Processor Design63 One More Modification - for Branch BEQ instruction depends on Zero output of ALU No other instruction uses Zero output Local decoding Implement with new "Branch" control signal Add AND gate to generate PCSelect
64
ECE 313 Fall 2004Lecture 11 - Processor Design64 Processor Design - Branch Modification
65
ECE 313 Fall 2004Lecture 11 - Processor Design65 Control Unit Implementation Review: Opcodes for key instructions Control Unit Truth Table: Fill in the blanks (or see Fig. 5-18, p. 308) Implementation: Decoder + 2 Gates (Fig. C.2.5) Op5Op4Op3Op2Op1Op0 RegDstALUSrcMemtoRegRegWriteMemReadMemWriteBranchALUOp1ALUOp0 000000 100011 101011 000100 OP RT lw sw beq InputOutput
66
ECE 313 Fall 2004Lecture 11 - Processor Design66 Control Unit Implementation Source: Tod Amon's COD2e Slides ©Morgan Kaufmann Publishers
67
ECE 313 Fall 2004Lecture 11 - Processor Design67 Final Extension: Implementing j (jump) Instruction Format Register Transfer: PC <- (PC + 4)[31:28] @ ( I[25:0] << 2 ) Remember, it’s unconditional 000010address 6 bits26 bits J-Format
68
ECE 313 Fall 2004Lecture 11 - Processor Design68 Final Extension: Implementing jump
69
ECE 313 Fall 2004Lecture 11 - Processor Design69 The Problem with Single-Cycle Processor Implementation: Performance Performance is limited by the slowest instruction Example: suppose we have the following delays Memory read/write200ps ALU and adders100ps Register File read/write50ps What is the critical path for each instruction? R-format200 + 50 + 100 + 0 + 50400ps Load word200 + 50 + 100 + 200 + 50600ps Store word200 + 50 + 100 + 200550ps Branch200 + 50 + 100350ps Jump200200ps
70
ECE 313 Fall 2004Lecture 11 - Processor Design70 Alternatives to Single-Cycle Multicycle Processor Implementation Shorter clock cycle Multiple clock cycles per instruction Some instructions take more cycles then others Less hardware required Pipelined Implementation Overlap execution of instructions Try to get short cycle times and low CPI More hardware required … but also more performance!
71
ECE 313 Fall 2004Lecture 11 - Processor Design71 Outline - Processor Implementation Overview Single-Cycle Implementation Multi-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Pipelined Implementation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.