Download presentation
Presentation is loading. Please wait.
Published byAlexia Charles Modified over 9 years ago
1
Computer Organization Lecture Set – 05.1 Chapter 5 Huei-Yung Lin
2
H.Y. Lin, CCUEE Computer Organization 2 Computer Systems Overview Technology Trends Performance Instruction Sets (and Software) Logic and Arithmetic Processor Implementation Memory Systems Input/Output Roadmap for the Term: Major Topics
3
H.Y. Lin, CCUEE Computer Organization 3 Outline - Processor Implementation Overview Review of Processor Operation Steps in Processor Design Implementation Styles The “ MIPS Lite ” Instruction Subset Single-Cycle Implementation Multi-Cycle Implementation Pipelined Implementation
4
H.Y. Lin, CCUEE Computer Organization 4 Processor Datapath Control Memory Input Output Input Processor Control Datapath Output Memory 1001010010110000 0010100101010001 1111011101100110 1001010010110000 Review: The “ Five Classic Components ”
5
H.Y. Lin, CCUEE Computer Organization 5 Executing Programs - the “ fetch/execute ” cycle Processor fetches instruction from memory Processor executes “ machine language ” instruction Perform calculation Read/write data Repeat with “ next ” instruction Processor Control Datapath 1001010010110000 0010100101010001 1111011101100110 1001010010110000 Memory 1111011101100110 1001010010110000 PC Address Instruction Review: Processor Operation
6
H.Y. Lin, CCUEE Computer Organization 6 Processor Design Goals Design hardware that: Fetches instructions from memory Executes instructions as specified by ISA Design considerations Cost Speed Power
7
H.Y. Lin, CCUEE Computer Organization 7 Steps in Processor Design 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals
8
H.Y. Lin, CCUEE Computer Organization 8 Processor Implementation Styles Single Cycle Perform each instruction in 1 clock cycle Disadvantage: only as fast as “ slowest ” instruction Multi-Cycle Break fetch/execute cycle into multiple steps Perform 1 step in each clock cycle Pipelined Execute each instruction in multiple steps Perform 1 step / instruction in each clock cycle Process multiple instructions in parallel - “ assembly line ”
9
H.Y. Lin, CCUEE Computer Organization 9 “ MIPS Lite ” - A Pedagogical Example Use a MIPS to illustrate processor design Limit initial design to a subset of instructions: Memory access: lw, sw Arithmetic/Logical: add, sub, and, or, slt Branch/Jump: beq, j Add instructions as we go along (e.g., addi )
10
H.Y. Lin, CCUEE Computer Organization 10 Review - MIPS Instruction Formats Field definitions: op: instruction opcode rs, rt, rd: source (2) and destination (1) register numbers shamt: shift amount funct: function code (works with opcode to specify op) offset/immediate: address offset or immediate value address: target address for jumps op rsrtoffset 6 bits5 bits 16 bits op rsrtrd funct shamt 6 bits5 bits 6 bits R-Format I-Format op address 6 bits26 bits J-Format
11
H.Y. Lin, CCUEE Computer Organization 11 MIPS Instruction Subset Arithmetic & Logical Instructions add $s0, $s1, $s2 sub $s0, $s1, $s2 and $s0, $s1, $s2 or $s0, $s1, $s2 Data Transfer Instructions lw $s1, offset($s0) sw $s2, offset($s3) Branch beq $s0, offset j address
12
H.Y. Lin, CCUEE Computer Organization 12 MIPS Instruction Execution General Procedure 1.Fetch Instruction from memory 2.Decode Instruction, read register values 3.If necessary, perform an ALU operation 4.If load or store, do memory access 5.Write results back to register file and increment PC Register Transfers provide a concise description
13
H.Y. Lin, CCUEE Computer Organization 13 Instruction Fetch Instruction <= MEM[PC] Instruction Execution Instr.Register Transfers add R[rd] <= R[rs] + R[rt];PC <= PC + 4 sub R[rd] <= R[rs] – R[rt];PC <= PC + 4 and R[rd] <= R[rs] & R[rt];PC <= PC + 4 or R[rd] <= R[rs] | R[rt];PC <= PC + 4 lw R[rt] <= MEM[R[rs] + s_extend(offset)]; PC<= PC + 4 sw MEM[R[rs] + sign_extend(offset)] <= R[rt];PC <= PC + 4 beq if (R[rs] == R[rt]) then PC <= PC+4 + s_extend(offset<<2) else PC <= PC + 4 j PC <= upper(PC)@(address << 2) Register Transfers for the MIPS Subset
14
H.Y. Lin, CCUEE Computer Organization 14 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation
15
H.Y. Lin, CCUEE Computer Organization 15 1. Instruction Set Requirements Memory Read Instructions Read and Write Data Registers - 32 read (from rs field in instruction) read (from rt field in instruction) write (from rd or rt field in instruction) PC Sign Extender Add and Subtract (register values) Add 4 or extended immediate to PC
16
H.Y. Lin, CCUEE Computer Organization 16 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation
17
H.Y. Lin, CCUEE Computer Organization 17 2. (a) Choose Datapath Components Combinational Components Adder ALU Multiplexer Sign Extender Storage Components Registers Register File Memory
18
H.Y. Lin, CCUEE Computer Organization 18 Datapath Combinational Components NOTES: - Blue-green inputs are control lines - Blue lines often hidden to suppress detail AdderALU Multiplexer Sign Extender
19
H.Y. Lin, CCUEE Computer Organization 19 Datapath Storage - Registers Registers store multiple bit values New value loaded on clock edge when EN asserted
20
H.Y. Lin, CCUEE Computer Organization 20 Datapath Storage: Idealized Memory Data Read Place Address on ADDR Assert MemRead Data Available on RD after memory “ access time ” Data Write Place address on ADDR Place data input on WD Assert MemWrite Data written on clock edge
21
H.Y. Lin, CCUEE Computer Organization 21 Datapath Storage: Register File Register File - 32 registers (including $zero ) Two data outputs RD1, RD2 Assert register number RN1/RN2 Read output RD1/RD2 after “ access time ” (propagation delay) One data input WD Assert register number WN Assert value on WD Assert RegWrite Value loaded on clock edge Implemented as a small multiport memory
22
H.Y. Lin, CCUEE Computer Organization 22 2. (b) Choose Clocking Methodology Clocking methodology defines When signals can be read from storage elements When signals can be written to storage elements Typical clocking methodologies Single-Phase Edge Triggered Single-Phase Level Triggered Multiple-Phase Level Triggered Authors ’ choice: Single-Phase Edge Triggered All registers updated on one edge of clock cycle Simplest to work with
23
H.Y. Lin, CCUEE Computer Organization 23 Review: Edge-Triggered Clocking Controls sequential circuit operation Register outputs change after first clock edge Combinational logic determines “ next state ” Storage elements store new state on next clock edge Adder Mux Combinational LogicRegister Output Register Input Clock
24
H.Y. Lin, CCUEE Computer Organization 24 Review: Edge-Triggered Clocking Propagation delay - t prop Logic (including register outputs) Interconnect Register setup time - t setup Clock Adder Mux Combinational LogicRegister Output Register Input t prop t setup t clock > t prop + t setup t clock = t prop + t setup + t slack
25
H.Y. Lin, CCUEE Computer Organization 25 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation
26
H.Y. Lin, CCUEE Computer Organization 26 3. Assemble Datapath Tasks processor must implement 1.Fetch Instruction from memory 2.Decode Instruction, read register values 3.If necessary, perform an ALU operation 4.If memory address, perform load/store 5.Write results back to register file and increment PC How can we do this with the datapath hardware?
27
H.Y. Lin, CCUEE Computer Organization 27 Instruction <= MEM[PC] PC <= PC + 4 Datapath for Instruction Fetch
28
H.Y. Lin, CCUEE Computer Organization 28 Datapath for R-Type Instructions add rd, rs, rt R[rd] <= R[rs] + R[rt];
29
H.Y. Lin, CCUEE Computer Organization 29 Datapath for Load/Store Instructions lw rt, offset(rs) R[rt] <- MEM[R[rs] + s_extend(offset)];
30
H.Y. Lin, CCUEE Computer Organization 30 Datapath for Load/Store Instructions sw rt, offset(rs) MEM[R[rs] + sign_extend(offset)] <= R[rt]
31
H.Y. Lin, CCUEE Computer Organization 31 Datapath for Branch Instructions beq rs, rt, offset if (R[rs] == R[rt]) then PC <= PC+4 + s_extend(offset<<2)
32
H.Y. Lin, CCUEE Computer Organization 32 Putting It All Together … Goal: merge datapaths for each function Instruction Fetch R-Type Instructions Load/Store Instructions Branch instructions Add multiplexers to steer data as needed
33
H.Y. Lin, CCUEE Computer Organization 33 Select an ALU input from either Register File output RD2 (for R-Type) Sign-extender output (for LW/SW) Select Register File input WD1 from either ALU output (for R-Type) Memory output RD (for LW) Example: Combine R-Type and Load/Store Datapaths
34
H.Y. Lin, CCUEE Computer Organization 34 Combined Datapath: R-Type and Load/Store Instructions
35
H.Y. Lin, CCUEE Computer Organization 35 add rd,rs,rt Combined Datapath: Executing an R-Type Instruction
36
H.Y. Lin, CCUEE Computer Organization 36 lw rt,offset(rs) Combined Datapath: Executing a load instruction
37
H.Y. Lin, CCUEE Computer Organization 37 sw rt,offset(rs) Combined Datapath: Executing a store instruction
38
H.Y. Lin, CCUEE Computer Organization 38 Complete Single-Cycle Datapath
39
H.Y. Lin, CCUEE Computer Organization 39 Complete Datapath Executing add add rd, rs, rt
40
H.Y. Lin, CCUEE Computer Organization 40 Complete Datapath Executing load lw rt,offset(rs)
41
H.Y. Lin, CCUEE Computer Organization 41 Complete Datapath Executing store sw rt,offset(rs)
42
H.Y. Lin, CCUEE Computer Organization 42 beq r1,r2,offset Complete Datapath Executing branch
43
H.Y. Lin, CCUEE Computer Organization 43 Refining the Complete Datapath Depending on the instruction, register file input WN is fed by different fields of the instruction R-Type Instructions: rd field (bits 15:11) Load Instructin: rt field (bits 21:16) Result: need an additional multiplexer on WN input oprsrtoffset 6 bits5 bits 16 bits oprsrtrdfunctshamt 6 bits5 bits 6 bits R-Format I-Format
44
H.Y. Lin, CCUEE Computer Organization 44 Complete Datapath (Refined)
45
H.Y. Lin, CCUEE Computer Organization 45 Complete Single-Cycle Datapath Control signals shown in blue
46
H.Y. Lin, CCUEE Computer Organization 46 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation
47
H.Y. Lin, CCUEE Computer Organization 47 Control Unit Design Desired function: Given an instruction word …. Generate control signals needed to execute instruction Implemented as a combinational logic function: Inputs Instruction word - op and funct fields ALU status output - Zero Outputs - processor control points ALU control signals Multiplexer control signals Register File & memory control signal
48
H.Y. Lin, CCUEE Computer Organization 48 Determining Control Points For each instruction type, determine proper value for each control point (control signal) 0 1 X ( don ’ t care - either 1 or 0 ) Ultimately … use these values to build a truth table
49
H.Y. Lin, CCUEE Computer Organization 49 Review: ALU Control Signals Functions: Figure B.5.13 (also in Ch. 5 - p. 301) ALU control inputFunction 000AND 001OR 010add 110subtract 111set on less than
50
H.Y. Lin, CCUEE Computer Organization 50 Control Signals - R-Type Instruction Control signals shown in blue 1 0 0 0 1 ??? Value depends on funct 0 0
51
H.Y. Lin, CCUEE Computer Organization 51 0 Control Signals - lw Instruction Control signals shown in blue 0 010 1 1 1 0 1
52
H.Y. Lin, CCUEE Computer Organization 52 0 Control Signals - sw Instruction Control signals shown in blue X 010 1 X 0 1 0
53
H.Y. Lin, CCUEE Computer Organization 53 Control Signals - beq Instruction Control signals shown in blue X 110 0 X 0 0 0 1 if Zero=1
54
H.Y. Lin, CCUEE Computer Organization 54 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation
55
H.Y. Lin, CCUEE Computer Organization 55 Control Unit Structure
56
H.Y. Lin, CCUEE Computer Organization 56 Control unit as shown: one huge logic block Idea: decompose into smaller logic blocks Smaller blocks can be faster Smaller blocks are easier to work with Observation (rephrased): The only control signal that depends on the funct field is the ALU Operation signal Idea: separate logic for ALU control More Notes About Control Unit Structure
57
H.Y. Lin, CCUEE Computer Organization 57 Modified Control Unit Structure This is called “derived control” or “Local decoding”
58
H.Y. Lin, CCUEE Computer Organization 58 Datapath with Modified Control Unit
59
H.Y. Lin, CCUEE Computer Organization 59 Review from Ch. 4: ALU Function Functions: Figure B.5.13 (also in Ch. 5 - p. 301) ALU control inputFunction 000AND 001OR 010add 110subtract 111set on less than
60
H.Y. Lin, CCUEE Computer Organization 60 ALU Usage in Processor Design Usage depends on instruction type Instruction type (specified by opcode) funct field (r-type instructions only) Encode instruction type in ALUOp signal OperationDesired Action lwadd swadd beqsubtract add subsubtract and or slt and or set on less than ALU Ctl. 010 110 010 110 000 001 111 funct XXXXXX 100000 100010 100100 100101 101010 Instr. type data transfer branch r-type ALUOp 00 01 10 XXXXXX means “don’t care”
61
H.Y. Lin, CCUEE Computer Organization 61 ALU Control - Truth Table (Fig. 5-13) Use don’t care values to minimize length Ignore F5, F4 (they are always “ 10 ” ) Assume ALUOp never equals “ 11 ” Operation 010 110 010 110 000 001 111 ALUOp1 0 X 1 1 1 1 1 ALUOp0 0 1 X X X X X F5 X X F4 X F3 X 0 0 0 0 1 F2 X 0 0 1 1 0 F1 X 0 1 0 0 1 F0 X 0 0 0 1 0 XXXXX XX XX XX XX XX
62
H.Y. Lin, CCUEE Computer Organization 62 ALU Control - Implementation Figure C.2.3, page C-6
63
H.Y. Lin, CCUEE Computer Organization 63 One More Modification - for Branch BEQ instruction depends on Zero output of ALU No other instruction uses Zero output Local decoding Implement with new "Branch" control signal Add AND gate to generate PCSelect
64
H.Y. Lin, CCUEE Computer Organization 64 Processor Design - Branch Modification
65
H.Y. Lin, CCUEE Computer Organization 65 Control Unit Implementation Review: Opcodes for key instructions Control Unit Truth Table: Fill in the blanks (or see Fig. 5-18, p. 308) Implementation: Decoder + 2 Gates (Fig. C.2.5) Op5Op4Op3Op2Op1Op0 RegDstALUSrcMemtoRegRegWriteMemReadMemWriteBranchALUOp1ALUOp0 000000 100011 101011 000100 OP RT lw sw beq InputOutput
66
H.Y. Lin, CCUEE Computer Organization 66 Control Unit Implementation
67
H.Y. Lin, CCUEE Computer Organization 67 Final Extension: Implementing j (jump) Instruction Format Register Transfer: PC <= (PC + 4)[31:28] @ ( I[25:0] << 2 ) Remember, it’s unconditional 000010 address 6 bits26 bits J-Format
68
H.Y. Lin, CCUEE Computer Organization 68 Final Extension: Implementing jump
69
H.Y. Lin, CCUEE Computer Organization 69 Performance is limited by the slowest instruction Example: suppose we have the following delays Memory read/write200ps ALU and adders100ps Register File read/write50ps What is the critical path for each instruction? R-format200 + 50 + 100 + 0 + 50400ps Load word200 + 50 + 100 + 200 + 50600ps Store word200 + 50 + 100 + 200550ps Branch200 + 50 + 100350ps Jump200200ps The Problem with Single-Cycle Processor Implementation: Performance
70
H.Y. Lin, CCUEE Computer Organization 70 Alternatives to Single-Cycle Multicycle Processor Implementation Shorter clock cycle Multiple clock cycles per instruction Some instructions take more cycles then others Less hardware required Pipelined Implementation Overlap execution of instructions Try to get short cycle times and low CPI More hardware required … but also more performance!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.