Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Organization Lecture Set – 05.1 Chapter 5 Huei-Yung Lin.

Similar presentations


Presentation on theme: "Computer Organization Lecture Set – 05.1 Chapter 5 Huei-Yung Lin."— Presentation transcript:

1 Computer Organization Lecture Set – 05.1 Chapter 5 Huei-Yung Lin

2 H.Y. Lin, CCUEE Computer Organization 2 Computer Systems Overview Technology Trends Performance Instruction Sets (and Software) Logic and Arithmetic Processor Implementation  Memory Systems Input/Output Roadmap for the Term: Major Topics

3 H.Y. Lin, CCUEE Computer Organization 3 Outline - Processor Implementation Overview   Review of Processor Operation  Steps in Processor Design  Implementation Styles  The “ MIPS Lite ” Instruction Subset Single-Cycle Implementation Multi-Cycle Implementation Pipelined Implementation

4 H.Y. Lin, CCUEE Computer Organization 4 Processor  Datapath  Control Memory Input Output Input Processor Control Datapath Output Memory 1001010010110000 0010100101010001 1111011101100110 1001010010110000 Review: The “ Five Classic Components ”

5 H.Y. Lin, CCUEE Computer Organization 5 Executing Programs - the “ fetch/execute ” cycle  Processor fetches instruction from memory  Processor executes “ machine language ” instruction Perform calculation Read/write data  Repeat with “ next ” instruction Processor Control Datapath 1001010010110000 0010100101010001 1111011101100110 1001010010110000 Memory 1111011101100110 1001010010110000 PC Address Instruction Review: Processor Operation

6 H.Y. Lin, CCUEE Computer Organization 6 Processor Design Goals Design hardware that:  Fetches instructions from memory  Executes instructions as specified by ISA Design considerations  Cost  Speed  Power

7 H.Y. Lin, CCUEE Computer Organization 7 Steps in Processor Design 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals

8 H.Y. Lin, CCUEE Computer Organization 8 Processor Implementation Styles Single Cycle  Perform each instruction in 1 clock cycle  Disadvantage: only as fast as “ slowest ” instruction Multi-Cycle  Break fetch/execute cycle into multiple steps  Perform 1 step in each clock cycle Pipelined  Execute each instruction in multiple steps  Perform 1 step / instruction in each clock cycle  Process multiple instructions in parallel - “ assembly line ”

9 H.Y. Lin, CCUEE Computer Organization 9 “ MIPS Lite ” - A Pedagogical Example Use a MIPS to illustrate processor design Limit initial design to a subset of instructions:  Memory access: lw, sw  Arithmetic/Logical: add, sub, and, or, slt  Branch/Jump: beq, j Add instructions as we go along (e.g., addi )

10 H.Y. Lin, CCUEE Computer Organization 10 Review - MIPS Instruction Formats Field definitions:  op: instruction opcode  rs, rt, rd: source (2) and destination (1) register numbers  shamt: shift amount  funct: function code (works with opcode to specify op)  offset/immediate: address offset or immediate value  address: target address for jumps op rsrtoffset 6 bits5 bits 16 bits op rsrtrd funct shamt 6 bits5 bits 6 bits R-Format I-Format op address 6 bits26 bits J-Format

11 H.Y. Lin, CCUEE Computer Organization 11 MIPS Instruction Subset Arithmetic & Logical Instructions add $s0, $s1, $s2 sub $s0, $s1, $s2 and $s0, $s1, $s2 or $s0, $s1, $s2 Data Transfer Instructions lw $s1, offset($s0) sw $s2, offset($s3) Branch beq $s0, offset j address

12 H.Y. Lin, CCUEE Computer Organization 12 MIPS Instruction Execution General Procedure 1.Fetch Instruction from memory 2.Decode Instruction, read register values 3.If necessary, perform an ALU operation 4.If load or store, do memory access 5.Write results back to register file and increment PC Register Transfers provide a concise description

13 H.Y. Lin, CCUEE Computer Organization 13 Instruction Fetch Instruction <= MEM[PC] Instruction Execution Instr.Register Transfers add R[rd] <= R[rs] + R[rt];PC <= PC + 4 sub R[rd] <= R[rs] – R[rt];PC <= PC + 4 and R[rd] <= R[rs] & R[rt];PC <= PC + 4 or R[rd] <= R[rs] | R[rt];PC <= PC + 4 lw R[rt] <= MEM[R[rs] + s_extend(offset)]; PC<= PC + 4 sw MEM[R[rs] + sign_extend(offset)] <= R[rt];PC <= PC + 4 beq if (R[rs] == R[rt]) then PC <= PC+4 + s_extend(offset<<2) else PC <= PC + 4 j PC <= upper(PC)@(address << 2) Register Transfers for the MIPS Subset

14 H.Y. Lin, CCUEE Computer Organization 14 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements  2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation

15 H.Y. Lin, CCUEE Computer Organization 15 1. Instruction Set Requirements Memory  Read Instructions  Read and Write Data Registers - 32  read (from rs field in instruction)  read (from rt field in instruction)  write (from rd or rt field in instruction) PC Sign Extender Add and Subtract (register values) Add 4 or extended immediate to PC

16 H.Y. Lin, CCUEE Computer Organization 16 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and  establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation

17 H.Y. Lin, CCUEE Computer Organization 17 2. (a) Choose Datapath Components Combinational Components  Adder  ALU  Multiplexer  Sign Extender Storage Components  Registers  Register File  Memory

18 H.Y. Lin, CCUEE Computer Organization 18 Datapath Combinational Components NOTES: - Blue-green inputs are control lines - Blue lines often hidden to suppress detail AdderALU Multiplexer Sign Extender

19 H.Y. Lin, CCUEE Computer Organization 19 Datapath Storage - Registers Registers store multiple bit values New value loaded on clock edge when EN asserted

20 H.Y. Lin, CCUEE Computer Organization 20 Datapath Storage: Idealized Memory Data Read  Place Address on ADDR  Assert MemRead  Data Available on RD after memory “ access time ” Data Write  Place address on ADDR  Place data input on WD  Assert MemWrite  Data written on clock edge

21 H.Y. Lin, CCUEE Computer Organization 21 Datapath Storage: Register File Register File - 32 registers (including $zero ) Two data outputs RD1, RD2  Assert register number RN1/RN2  Read output RD1/RD2 after “ access time ” (propagation delay) One data input WD  Assert register number WN  Assert value on WD  Assert RegWrite  Value loaded on clock edge Implemented as a small multiport memory

22 H.Y. Lin, CCUEE Computer Organization 22 2. (b) Choose Clocking Methodology Clocking methodology defines  When signals can be read from storage elements  When signals can be written to storage elements Typical clocking methodologies  Single-Phase Edge Triggered  Single-Phase Level Triggered  Multiple-Phase Level Triggered Authors ’ choice: Single-Phase Edge Triggered  All registers updated on one edge of clock cycle  Simplest to work with

23 H.Y. Lin, CCUEE Computer Organization 23 Review: Edge-Triggered Clocking Controls sequential circuit operation  Register outputs change after first clock edge  Combinational logic determines “ next state ”  Storage elements store new state on next clock edge Adder Mux Combinational LogicRegister Output Register Input Clock

24 H.Y. Lin, CCUEE Computer Organization 24 Review: Edge-Triggered Clocking Propagation delay - t prop Logic (including register outputs) Interconnect Register setup time - t setup Clock Adder Mux Combinational LogicRegister Output Register Input t prop t setup t clock > t prop + t setup t clock = t prop + t setup + t slack

25 H.Y. Lin, CCUEE Computer Organization 25 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements  4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation

26 H.Y. Lin, CCUEE Computer Organization 26 3. Assemble Datapath Tasks processor must implement 1.Fetch Instruction from memory 2.Decode Instruction, read register values 3.If necessary, perform an ALU operation 4.If memory address, perform load/store 5.Write results back to register file and increment PC How can we do this with the datapath hardware?

27 H.Y. Lin, CCUEE Computer Organization 27 Instruction <= MEM[PC] PC <= PC + 4 Datapath for Instruction Fetch

28 H.Y. Lin, CCUEE Computer Organization 28 Datapath for R-Type Instructions add rd, rs, rt R[rd] <= R[rs] + R[rt];

29 H.Y. Lin, CCUEE Computer Organization 29 Datapath for Load/Store Instructions lw rt, offset(rs) R[rt] <- MEM[R[rs] + s_extend(offset)];

30 H.Y. Lin, CCUEE Computer Organization 30 Datapath for Load/Store Instructions sw rt, offset(rs) MEM[R[rs] + sign_extend(offset)] <= R[rt]

31 H.Y. Lin, CCUEE Computer Organization 31 Datapath for Branch Instructions beq rs, rt, offset if (R[rs] == R[rt]) then PC <= PC+4 + s_extend(offset<<2)

32 H.Y. Lin, CCUEE Computer Organization 32 Putting It All Together … Goal: merge datapaths for each function  Instruction Fetch  R-Type Instructions  Load/Store Instructions  Branch instructions Add multiplexers to steer data as needed

33 H.Y. Lin, CCUEE Computer Organization 33 Select an ALU input from either  Register File output RD2 (for R-Type)  Sign-extender output (for LW/SW) Select Register File input WD1 from either  ALU output (for R-Type)  Memory output RD (for LW) Example: Combine R-Type and Load/Store Datapaths

34 H.Y. Lin, CCUEE Computer Organization 34 Combined Datapath: R-Type and Load/Store Instructions

35 H.Y. Lin, CCUEE Computer Organization 35 add rd,rs,rt Combined Datapath: Executing an R-Type Instruction

36 H.Y. Lin, CCUEE Computer Organization 36 lw rt,offset(rs) Combined Datapath: Executing a load instruction

37 H.Y. Lin, CCUEE Computer Organization 37 sw rt,offset(rs) Combined Datapath: Executing a store instruction

38 H.Y. Lin, CCUEE Computer Organization 38 Complete Single-Cycle Datapath

39 H.Y. Lin, CCUEE Computer Organization 39 Complete Datapath Executing add add rd, rs, rt

40 H.Y. Lin, CCUEE Computer Organization 40 Complete Datapath Executing load lw rt,offset(rs)

41 H.Y. Lin, CCUEE Computer Organization 41 Complete Datapath Executing store sw rt,offset(rs)

42 H.Y. Lin, CCUEE Computer Organization 42 beq r1,r2,offset Complete Datapath Executing branch

43 H.Y. Lin, CCUEE Computer Organization 43 Refining the Complete Datapath Depending on the instruction, register file input WN is fed by different fields of the instruction  R-Type Instructions: rd field (bits 15:11)  Load Instructin: rt field (bits 21:16) Result: need an additional multiplexer on WN input oprsrtoffset 6 bits5 bits 16 bits oprsrtrdfunctshamt 6 bits5 bits 6 bits R-Format I-Format

44 H.Y. Lin, CCUEE Computer Organization 44 Complete Datapath (Refined)

45 H.Y. Lin, CCUEE Computer Organization 45 Complete Single-Cycle Datapath Control signals shown in blue

46 H.Y. Lin, CCUEE Computer Organization 46 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction  5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation

47 H.Y. Lin, CCUEE Computer Organization 47 Control Unit Design Desired function:  Given an instruction word ….  Generate control signals needed to execute instruction Implemented as a combinational logic function:  Inputs Instruction word - op and funct fields ALU status output - Zero  Outputs - processor control points ALU control signals Multiplexer control signals Register File & memory control signal

48 H.Y. Lin, CCUEE Computer Organization 48 Determining Control Points For each instruction type, determine proper value for each control point (control signal)  0  1  X ( don ’ t care - either 1 or 0 ) Ultimately … use these values to build a truth table

49 H.Y. Lin, CCUEE Computer Organization 49 Review: ALU Control Signals Functions: Figure B.5.13 (also in Ch. 5 - p. 301) ALU control inputFunction 000AND 001OR 010add 110subtract 111set on less than

50 H.Y. Lin, CCUEE Computer Organization 50 Control Signals - R-Type Instruction Control signals shown in blue 1 0 0 0 1 ??? Value depends on funct 0 0

51 H.Y. Lin, CCUEE Computer Organization 51 0 Control Signals - lw Instruction Control signals shown in blue 0 010 1 1 1 0 1

52 H.Y. Lin, CCUEE Computer Organization 52 0 Control Signals - sw Instruction Control signals shown in blue X 010 1 X 0 1 0

53 H.Y. Lin, CCUEE Computer Organization 53 Control Signals - beq Instruction Control signals shown in blue X 110 0 X 0 0 0 1 if Zero=1

54 H.Y. Lin, CCUEE Computer Organization 54 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals  Multi-Cycle Implementation Pipelined Implementation

55 H.Y. Lin, CCUEE Computer Organization 55 Control Unit Structure

56 H.Y. Lin, CCUEE Computer Organization 56 Control unit as shown: one huge logic block Idea: decompose into smaller logic blocks  Smaller blocks can be faster  Smaller blocks are easier to work with Observation (rephrased):  The only control signal that depends on the funct field is the ALU Operation signal  Idea: separate logic for ALU control More Notes About Control Unit Structure

57 H.Y. Lin, CCUEE Computer Organization 57 Modified Control Unit Structure This is called “derived control” or “Local decoding”

58 H.Y. Lin, CCUEE Computer Organization 58 Datapath with Modified Control Unit

59 H.Y. Lin, CCUEE Computer Organization 59 Review from Ch. 4: ALU Function Functions: Figure B.5.13 (also in Ch. 5 - p. 301) ALU control inputFunction 000AND 001OR 010add 110subtract 111set on less than

60 H.Y. Lin, CCUEE Computer Organization 60 ALU Usage in Processor Design Usage depends on instruction type  Instruction type (specified by opcode)  funct field (r-type instructions only) Encode instruction type in ALUOp signal OperationDesired Action lwadd swadd beqsubtract add subsubtract and or slt and or set on less than ALU Ctl. 010 110 010 110 000 001 111 funct XXXXXX 100000 100010 100100 100101 101010 Instr. type data transfer branch r-type ALUOp 00 01 10 XXXXXX means “don’t care”

61 H.Y. Lin, CCUEE Computer Organization 61 ALU Control - Truth Table (Fig. 5-13) Use don’t care values to minimize length  Ignore F5, F4 (they are always “ 10 ” )  Assume ALUOp never equals “ 11 ” Operation 010 110 010 110 000 001 111 ALUOp1 0 X 1 1 1 1 1 ALUOp0 0 1 X X X X X F5 X X F4 X F3 X 0 0 0 0 1 F2 X 0 0 1 1 0 F1 X 0 1 0 0 1 F0 X 0 0 0 1 0 XXXXX XX XX XX XX XX

62 H.Y. Lin, CCUEE Computer Organization 62 ALU Control - Implementation Figure C.2.3, page C-6

63 H.Y. Lin, CCUEE Computer Organization 63 One More Modification - for Branch BEQ instruction depends on Zero output of ALU No other instruction uses Zero output Local decoding  Implement with new "Branch" control signal  Add AND gate to generate PCSelect

64 H.Y. Lin, CCUEE Computer Organization 64 Processor Design - Branch Modification

65 H.Y. Lin, CCUEE Computer Organization 65 Control Unit Implementation Review: Opcodes for key instructions Control Unit Truth Table: Fill in the blanks (or see Fig. 5-18, p. 308) Implementation: Decoder + 2 Gates (Fig. C.2.5) Op5Op4Op3Op2Op1Op0 RegDstALUSrcMemtoRegRegWriteMemReadMemWriteBranchALUOp1ALUOp0 000000 100011 101011 000100 OP RT lw sw beq InputOutput

66 H.Y. Lin, CCUEE Computer Organization 66 Control Unit Implementation

67 H.Y. Lin, CCUEE Computer Organization 67 Final Extension: Implementing j (jump) Instruction Format Register Transfer: PC <= (PC + 4)[31:28] @ ( I[25:0] << 2 ) Remember, it’s unconditional 000010 address 6 bits26 bits J-Format

68 H.Y. Lin, CCUEE Computer Organization 68 Final Extension: Implementing jump

69 H.Y. Lin, CCUEE Computer Organization 69 Performance is limited by the slowest instruction Example: suppose we have the following delays  Memory read/write200ps  ALU and adders100ps  Register File read/write50ps What is the critical path for each instruction?  R-format200 + 50 + 100 + 0 + 50400ps  Load word200 + 50 + 100 + 200 + 50600ps  Store word200 + 50 + 100 + 200550ps  Branch200 + 50 + 100350ps  Jump200200ps The Problem with Single-Cycle Processor Implementation: Performance

70 H.Y. Lin, CCUEE Computer Organization 70 Alternatives to Single-Cycle Multicycle Processor Implementation  Shorter clock cycle  Multiple clock cycles per instruction  Some instructions take more cycles then others  Less hardware required Pipelined Implementation  Overlap execution of instructions  Try to get short cycle times and low CPI  More hardware required … but also more performance!


Download ppt "Computer Organization Lecture Set – 05.1 Chapter 5 Huei-Yung Lin."

Similar presentations


Ads by Google