Morgan Kaufmann Publishers The Processor 26 August, 2018 Chapter 4 The Processor Chapter 4 — The Processor
Morgan Kaufmann Publishers 26 August, 2018 Introduction §4.1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two LEGv8 implementations A simplified version A more realistic pipelined version Simple subset, shows most aspects Memory reference: LDUR, STUR Arithmetic/logical: ADD, SUB, AND, ORR, SLT Control transfer: Compare and branch on zero (CBZ), Branch (B), beq, j Chapter 4 — The Processor
Instruction Execution Morgan Kaufmann Publishers 26 August, 2018 Instruction Execution PC instruction memory, fetch instruction Register numbers register file, read registers Depending on instruction class Use ALU to calculate Arithmetic result Memory address for load/store Branch target address Access data memory for load/store PC target address or PC + 4 Chapter 4 — The Processor
Morgan Kaufmann Publishers 26 August, 2018 CPU Overview Chapter 4 — The Processor
Morgan Kaufmann Publishers 26 August, 2018 Multiplexers Can’t just join wires together Use multiplexers Chapter 4 — The Processor
Morgan Kaufmann Publishers 26 August, 2018 Control Chapter 4 — The Processor
Morgan Kaufmann Publishers 26 August, 2018 Logic Design Basics §4.2 Logic Design Conventions Information encoded in binary Low voltage = 0, High voltage = 1 One wire per bit Multi-bit data encoded on multi-wire buses Combinational element Operate on data Output is a function of input State (sequential) elements Store information Chapter 4 — The Processor
Combinational Elements Morgan Kaufmann Publishers 26 August, 2018 Combinational Elements AND-gate Y = A & B Adder Y = A + B A B Y + A B Y Arithmetic/Logic Unit Y = F(A, B) Multiplexer Y = S ? I1 : I0 A B Y ALU F I0 I1 Y M u x S Chapter 4 — The Processor
Morgan Kaufmann Publishers Sequential Elements 26 August, 2018 Register: stores data in a circuit Uses a clock signal to determine when to update the stored value Edge-triggered: update when Clk changes from 0 to 1 Clk D Q D Clk Q Chapter 4 — The Processor
Morgan Kaufmann Publishers Sequential Elements 26 August, 2018 Register with write control Only updates on clock edge when write control input is 1 Used when stored value is required later Write D Q Clk D Clk Q Write Chapter 4 — The Processor
Morgan Kaufmann Publishers Clocking Methodology 26 August, 2018 Combinational logic transforms data during clock cycles Between clock edges Input from state elements, output to state element Longest delay determines clock period Chapter 4 — The Processor
Morgan Kaufmann Publishers 26 August, 2018 Building a Datapath §4.3 Building a Datapath Datapath Elements that process data and addresses in the CPU Registers, ALUs, mux’s, memories, … We will build a LEGv8 datapath incrementally Refining the overview design Chapter 4 — The Processor
Morgan Kaufmann Publishers 26 August, 2018 Instruction Fetch Increment by 4 for next instruction 64-bit register Chapter 4 — The Processor
R-Format Instructions Morgan Kaufmann Publishers R-Format Instructions 26 August, 2018 Read two register operands Perform arithmetic/logical operation Write register result Chapter 4 — The Processor
Load/Store Instructions Morgan Kaufmann Publishers Load/Store Instructions 26 August, 2018 LDUR X1,[X2,offset_value] or STUR X1, [X2,offset_value] Read register operands, and Calculate memory address by adding the base register X2 with 9-bit signed offset Use ALU, but sign-extend the 9-bit offset field in the instruction to a 64-bit signed value Load: Read memory and write into register file (register X1 here) Store: read register file (X1) and write value to memory Chapter 4 — The Processor
Morgan Kaufmann Publishers 26 August, 2018 Branch Instructions CBZ X1,offset XI register is tested for zero, and a 19-bit offset used to compute the branch target address relative to the branch instruction address Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement The base for the branch address calculation is the address of the branch instruction Shift left offset field by 2 bits so that it is a word offset If the operand (X1) is zero, the branch target address is the new PC If the operand is not zero, the incremented PC (PC+4, during instruction fetch) replaces the current PC Chapter 4 — The Processor
Datapath segment for branches Morgan Kaufmann Publishers Datapath segment for branches 26 August, 2018 Just re-routes wires Sign-bit wire replicated Chapter 4 — The Processor
Composing the Elements Morgan Kaufmann Publishers 26 August, 2018 Composing the Elements The simplest datapath executes all instructions in one clock cycle Each datapath element can only do one function at a time Hence, we need separate instruction and data memories Use multiplexers where alternate data sources are used for different instructions Chapter 4 — The Processor
R-Type/Load/Store Datapath Morgan Kaufmann Publishers 26 August, 2018 R-Type/Load/Store Datapath Chapter 4 — The Processor
Morgan Kaufmann Publishers 26 August, 2018 Full Datapath Chapter 4 — The Processor
Morgan Kaufmann Publishers ALU Control 26 August, 2018 Load/Store (LDUR/STUR): ALU computes the memory address by addition R-type instructions: ALU performs one of the four actions (AND, OR, subtract, or add), depending on the value of the 11-bit opcode field in the instruction compare and branch zero (CBZ): ALU just passes the register input value. Small control unit Input: opcode field of the instruction and a 2-bit control field, called ALUOp, with the following values: (00) indicates the operation to be performed should be add for loads and stores, (01) pass input b for CBZ, (10) determined by the operation encoded in the opcode field. Output: 4-bit signal that directly controls the ALU by generating one of the 6 combinations shown below §4.4 A Simple Implementation Scheme ALU control lines Function 0000 AND 0001 OR 0010 add 0110 subtract 0111 pass input b 1100 NOR Chapter 4 — The Processor
Morgan Kaufmann Publishers ALU Control 26 August, 2018 ALU control inputs based on the 2-bit ALUOp control and the 11-bit opcode. ALUOp bits are generated from the main control unit. Multiple levels of decoding - common implementation technique can reduce the size of the main control unit potentially reduce the latency of the control unit opcode ALUOp Operation Opcode field ALU function ALU control LDUR 00 load register XXXXXXXXXXX add 0010 STUR store register CBZ 01 compare and branch on zero pass input b 0111 R-type 10 100000 subtract 100010 0110 AND 100100 0000 ORR 100101 OR 0001 Chapter 4 — The Processor
Morgan Kaufmann Publishers The Main Control Unit 26 August, 2018 Control signals derived from instruction Opcode field: 6 – 11 bits wide, bit positions 31:26 to 31:21 First register operand: bit positions 9:5 (Rn) Other register operand: bit positions 20:16 (Rm), 4:0 (Rt) Another operand: 19-bit offset (CBZ) or 9-bit offset (Load/Store) The destination register for R-type instructions (Rd) and for loads (Rt) is in bit positions 4:0. Chapter 4 — The Processor