Morgan Kaufmann Publishers The Processor April 27, 2017 CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Chapter 4 The Processor Revised from original slides provided by MKP Chapter 1 — Computer Abstractions and Technology
Morgan Kaufmann Publishers 27 April, 2017 Introduction §4.1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified version A more realistic pipelined version Simple subset, shows most aspects Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j Chapter 4 — The Processor — 2 Chapter 4 — The Processor
Instruction Execution Morgan Kaufmann Publishers 27 April, 2017 Instruction Execution PC instruction memory, fetch instruction Register numbers register file, read registers Depending on instruction class Use ALU to calculate Arithmetic result Memory address for load/store Branch target address Access data memory for load/store PC target address or PC + 4 Chapter 4 — The Processor — 3 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 CPU Overview Chapter 4 — The Processor — 4 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 Multiplexers Can’t just join wires together Use multiplexers Chapter 4 — The Processor — 5 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 Control Chapter 4 — The Processor — 6 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 Logic Design Basics Information encoded in binary Low voltage = 0, High voltage = 1 One wire per bit Multi-bit data encoded on multi-wire buses Combinational element Operate on data Output is a function of input State (sequential) elements Store information §4.2 Logic Design Conventions Chapter 4 — The Processor — 7 Chapter 4 — The Processor
Combinational Elements Morgan Kaufmann Publishers 27 April, 2017 Combinational Elements AND-gate Y = A & B Adder Y = A + B A B Y + A B Y Arithmetic/Logic Unit Y = F(A, B) Multiplexer Y = S ? I1 : I0 A B Y ALU F I0 I1 Y M u x S Chapter 4 — The Processor — 8 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 Sequential Elements Register: stores data in a circuit Uses a clock signal to determine when to update the stored value Edge-triggered: update when Clk changes from 0 to 1 Clk D Q D Clk Q Chapter 4 — The Processor — 9 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 Sequential Elements Register with write control Only updates on clock edge when write control input is 1 Used when stored value is required later Write D Q Clk D Clk Q Write Chapter 4 — The Processor — 10 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 Clocking Methodology Combinational logic transforms data during clock cycles Between clock edges Input from state elements, output to state element Longest delay determines clock period Chapter 4 — The Processor — 11 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 Building a Datapath Datapath Elements that process data and addresses in the CPU Registers, ALUs, mux’s, memories, … We will build a MIPS datapath incrementally Refining the overview design §4.3 Building a Datapath Chapter 4 — The Processor — 12 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 Instruction Fetch Increment by 4 for next instruction 32-bit register Chapter 4 — The Processor — 13 Chapter 4 — The Processor
R-Format Instructions Morgan Kaufmann Publishers 27 April, 2017 R-Format Instructions Read two register operands Perform arithmetic/logical operation Write register result Chapter 4 — The Processor — 14 Chapter 4 — The Processor
Load/Store Instructions Morgan Kaufmann Publishers 27 April, 2017 Load/Store Instructions Read register operands Calculate address using 16-bit offset Use ALU, but sign-extend offset Load: Read memory and update register Store: Write register value to memory Chapter 4 — The Processor — 15 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 Branch Instructions Read register operands Compare operands Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4 Already calculated by instruction fetch Chapter 4 — The Processor — 16 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 Branch Instructions Just re-routes wires Sign-bit wire replicated Chapter 4 — The Processor — 17 Chapter 4 — The Processor
Composing the Elements Morgan Kaufmann Publishers 27 April, 2017 Composing the Elements First-cut data path does an instruction in one clock cycle Each datapath element can only do one function at a time Hence, we need separate instruction and data memories Use multiplexers where alternate data sources are used for different instructions Chapter 4 — The Processor — 18 Chapter 4 — The Processor
R-Type/Load/Store Datapath Morgan Kaufmann Publishers 27 April, 2017 R-Type/Load/Store Datapath Chapter 4 — The Processor — 19 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 Full Datapath Chapter 4 — The Processor — 20 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 ALU Control ALU used for Load/Store: F = add Branch: F = subtract R-type: F depends on funct field §4.4 A Simple Implementation Scheme ALU control Function 0000 AND 0001 OR 0010 add 0110 subtract 0111 set-on-less-than 1100 NOR Jump to slide 24, show ALU control Chapter 4 — The Processor — 21 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 ALU Control Assume 2-bit ALUOp derived from opcode Combinational logic derives ALU control opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 0010 sw store word beq 01 branch equal subtract 0110 R-type 10 100000 100010 AND 100100 0000 OR 100101 0001 set-on-less-than 101010 0111 Chapter 4 — The Processor — 22 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 The Main Control Unit Control signals derived from instruction R-type rs rt rd shamt funct 31:26 5:0 25:21 20:16 15:11 10:6 Load/ Store 35 or 43 rs rt address 31:26 25:21 20:16 15:0 4 rs rt address 31:26 25:21 20:16 15:0 Branch opcode always read read, except for load write for R-type and load sign-extend and add Chapter 4 — The Processor — 23 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 Datapath With Control Chapter 4 — The Processor — 24 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 R-Type Instruction Jump to add-on slide, #32, show the meaning of each control signal Chapter 4 — The Processor — 25 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 Load Instruction Chapter 4 — The Processor — 26 Chapter 4 — The Processor
Branch-on-Equal Instruction Morgan Kaufmann Publishers 27 April, 2017 Branch-on-Equal Instruction Chapter 4 — The Processor — 27 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 Implementing Jumps 2 address 31:26 25:0 Jump Jump uses word address Update PC with concatenation of Top 4 bits of old PC 26-bit jump address 00 Need an extra control signal decoded from opcode Chapter 4 — The Processor — 28 Chapter 4 — The Processor
Datapath With Jumps Added Morgan Kaufmann Publishers 27 April, 2017 Datapath With Jumps Added Chapter 4 — The Processor — 29 Chapter 4 — The Processor
Morgan Kaufmann Publishers 27 April, 2017 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not feasible to vary period for different instructions Violates design principle Making the common case fast We will improve performance by pipelining Chapter 4 — The Processor — 30 Chapter 4 — The Processor
Add-On Chapter 1 — Computer Abstractions and Technology — 31
Summary of Control Signals RegDst: Write to registed rd or rt ALUSrc: The 2nd ALU operand is from the immediate field or not MemtoReg: Write Data Memory output or ALU output to the Registers? RegWrite: Write to the Registers or not? MemRead: Read from Data Memory or not? MemWrite: Write to the Data Memory or not? Branch: Is the instruction a branch or not? ALUOp[1:0]: ALU control field Chapter 1 — Computer Abstractions and Technology — 32
Summary of Control Signals Morgan Kaufmann Publishers April 27, 2017 Summary of Control Signals Signal Description RegDst Write to register rd or rt ALUSrc The 2nd ALU operand is from the immediate field or not MemtoReg Write Data Memory output or ALU output to the Registers? RegWrite Write Registers or not? MemRead Read from Data Memory or not? MemWrite Write to Data Memory or not Branch Is the current instruction a branch or not? ALUOp[1:0] ALU control field Chapter 1 — Computer Abstractions and Technology — 33 Chapter 1 — Computer Abstractions and Technology
Control Signal Setting What’re the control signal values for each instruction or instruction type? Inst Reg-Dst ALU-Src Mem-toReg Reg-Write Mem-Read Mem-Write Branch ALUOp1 ALUOp0 R- lw sw beq Note: R- means R-format Chapter 1 — Computer Abstractions and Technology — 34
Control Signal Setting What’re the control signal values for each instruction or instruction type? Inst Reg-Dst ALU-Src Mem-toReg Reg-Write Mem-Read Mem-Write Branch ALUOp1 ALUOp0 R- 1 lw sw X beq Note: “R-” means R-format Chapter 1 — Computer Abstractions and Technology — 35
Extend Single-Cycle MIPS Consider the following instructions addi: add immediate sll: Shift left logic by a constant bne: branch if not equal jal: Jump and link jr: Jump register Chapter 1 — Computer Abstractions and Technology — 36
SCPv0: R-Format, LW/SW, BEQ Morgan Kaufmann Publishers 27 April, 2017 SCPv0: R-Format, LW/SW, BEQ Chapter 4 — The Processor — 37 Chapter 4 — The Processor
SCPv1: R-Format, LW/SW, BEQ, J Morgan Kaufmann Publishers 27 April, 2017 SCPv1: R-Format, LW/SW, BEQ, J Chapter 4 — The Processor — 38 Chapter 4 — The Processor
SCPv1: Control Signals What’re the control signal values for each instruction or instruction type? Inst Reg-Dst ALU-Src Mem-toReg Reg-Write MemRead MemWrite Branch ALUOp1 ALUOp0 Jump R- 1 lw sw X beq j Note: “R-” means R-format Chapter 1 — Computer Abstractions and Technology — 39
Extend the Single-Cycle Processor For each instruction, do we need Any new or revised datapath element(s)? Any new control signal(s)? Then, if necessary, Design new datapath elements or revise existing ones Add new control signals or extend existing ones Revise the main control and the ALU control Chapter 1 — Computer Abstractions and Technology — 40
SCPv0 + ADDI addi rs, rt, immediate R[rt] = R[rs]+SignExtImm Read register operands (only one is used) Sign extend the immediate (in parallel) Perform arithmetic/logical operation Write register result 001000 rs rt immediate 31:26 25:21 20:16 15:0 Chapter 1 — Computer Abstractions and Technology — 41
SCPv0 + ADDI What changes to this baseline? Chapter 1 — Computer Abstractions and Technology — 42
Morgan Kaufmann Publishers 27 April, 2017 SCPv0 + ADDI Do we need new or revised datapath elements? Chapter 4 — The Processor — 43 Chapter 4 — The Processor
SCPv0 + ADDI Do we need new or revised datapath elements? Do we need new control signal(s)? Inst Reg-Dst ALU-Src Mem-toReg Reg-Write Mem-Read Mem-Write Branch ALUOp1 ALUOp0 R- 1 lw sw X beq addi Chapter 1 — Computer Abstractions and Technology — 44
SCPv0 + ADDI Like LW Like R-format arithmetic Write to R[rd] or R[rt]? What’s the 2nd ALU source? What ALUOp? Like R-format arithmetic Write memory or ALU output? Read memory? Inst Reg-Dst ALU-Src Mem-toReg Reg-Write Mem-Read Mem-Write Branch ALUOp1 ALUOp0 R- 1 lw sw X beq addi Chapter 1 — Computer Abstractions and Technology — 45
SCPv0 + SLL sll rd, rs, shamt R[rd] = R[rs]<<shamt Read register operands (only one is used) Perform shift operation Write register result Note: sllv rd, rt, rs for shift left logic variable 000000 rs rt rd shamt 31:26 5:0 25:21 20:16 15:11 10:6 Chapter 1 — Computer Abstractions and Technology — 46
SCPv0 + SLL What changes to the datapath elements? Chapter 1 — Computer Abstractions and Technology — 47
SCPv0 + SLL ALU needs to do the shift operation ALU 2nd input needs another source Chapter 1 — Computer Abstractions and Technology — 48
SCPv0 + SLL Add a third source to the 2nd ALU input Shamt: Instruction[10-6] Extend ALUSrc to two bits 00: R[rt] 01: SignExtImm 10: Shamt Extend ALU control Add an ALU control code for SLL Chapter 1 — Computer Abstractions and Technology — 49
Morgan Kaufmann Publishers 27 April, 2017 SCPv0 + SLL Extend ALU control: Choose a code of your choice (kkkk shown in the table) opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 0010 sw store word beq 01 branch equal subtract 0110 R-type 10 100000 100010 AND 100100 0000 OR 100101 0001 set-on-less-than 101010 0111 shift-left-logic 000000 kkkk Chapter 4 — The Processor — 50 Chapter 4 — The Processor
SCPv0 + ADDI Inst Reg-Dst ALU-Src Reg-Write Mem-Read Mem-Write Branch Mem-toReg Reg-Write Mem-Read Mem-Write Branch ALUOp R- 1 1 0 lw 0 0 sw X beq 0 1 sll Inst Reg-Dst ALU-Src Mem-toReg Reg-Write Mem-Read Mem-Write Branch ALUOp R- 1 0 0 1 0 lw 0 1 sw X beq sll Chapter 1 — Computer Abstractions and Technology — 51
SCPv0 + BNE bne rs, rt, label PC = (R[Rs]==R[rt]) ? PC+4+(SignExtImm<<2) : PC+4 Read register operands Compare operands Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4 Already calculated by instruction fetch 000101 rs rt offset 31:26 25:21 20:16 15:0 Chapter 1 — Computer Abstractions and Technology — 52
Morgan Kaufmann Publishers 27 April, 2017 SCPv0 + BNE Make what changes to the datapath? Chapter 4 — The Processor — 53 Chapter 4 — The Processor
SCPv0 + BNE Extend Branch to two bits 10: Branch-Equal 11: Branch-Not-Equal Replace the AND gate with the following logic Branch Zero Branch taken? 1 0 1 1 1 otherwise Chapter 1 — Computer Abstractions and Technology — 54
SCPv0 + BNE Inst Reg-Dst ALU-Src Reg-Write Mem-Read Mem-Write Branch Mem-toReg Reg-Write Mem-Read Mem-Write Branch ALUOp R- 1 1 0 lw 0 0 sw X beq 0 1 bne Inst Reg-Dst ALU-Src Mem-toReg Reg-Write Mem-Read Mem-Write Branch ALUOp R- 1 0 0 1 0 lw sw X beq 0 1 bne 1 1 Chapter 1 — Computer Abstractions and Technology — 55
SCPv1 + JAL jal target PC = JumpAddr R[31] = PC+4 Jump uses word address Update PC with JumpAddr: concatenation of top 4 bits of old PC, 26-bit jump address, and 00 (called pseudo-direct) Save PC+4 to $ra 000011 address 31:26 25:0 Chapter 1 — Computer Abstractions and Technology — 56
Morgan Kaufmann Publishers 27 April, 2017 SCPv1 + JAL Make what changes to the datapath? Chapter 4 — The Processor — 57 Chapter 4 — The Processor