COSC 2021: Computer Organization Instructor: Dr. Amir Asif Department of Computer Science York University Handout # 10 Designing a MIPS Processor I: Control Topics: 1. A single cycle implementation 2. State Diagrams 3. A multiple cycle implementation Patterson: Section 4.4 and Appendix D.1 – D.4
Arithmetic and Logical Review Goal: Implement a subset of core instructions from the MIPS instruction set, given below Category Instruction Example Meaning Comments Arithmetic and Logical add add $s1,$s2,$s3 $s1 ← $s2+$s3 subtract sub $s1,$s2,$s3 $s1 ← $s2-$s3 and $s1 ← $s2&$s3 & => and or or $s1,$s2,$s3 $s1 ← $s2|$s3 | => or slt slt $s1,$s2,$s3 If $s1 < $s3, $s1←1 else $s1←0 Data Transfer load word lw $s1,100($s2) $s1 ← Mem[$s2+100] store word sw $s1,100($s2) Mem[$s2+100] ← $s1 Branch branch on equal beq $s1,$s2,L if($s1==$s2) go to L unconditional jump j 2500 go to 10000
Combined Datapath P C I n s t r u c i o m e y R a d 1 6 3 2 A L U l M x g W S h f 4 p Z D
Combined Datapath: Arithmetic Operations (add,sub,or,and,slt) add/sub/or/and/slt $s1,$s2,$s3 P C I n s t r u c i o m e y R a d 1 6 3 2 A L U l M x g W S h f 4 p Z D
Combined Datapath: Data Transfer (load) lw $s1, offset($s2) P C I n s t r u c i o m e y R a d 1 6 3 2 A L U l M x g W S h f 4 p Z D
Review: Data Transfer (store) sw $s1, offset($s2) P C S r c M A d u x 4 A d d A L U r e s u l t S h i f t l e f t 2 R e g i s t e r s A L U o p e r a t i o n R e a d 3 M e m W r i t e R e a d r e g i s t e r 1 A L U S r c P C R e a d a d d r e s s R e a d d a t a 1 M e m t o R e g r e g i s t e r 2 Z e r o I n s t r u c t i o n A L U A L U W r i t e R e a d R e a d A d d r e s s r e g i s t e r d a t a 2 M r e s u l t d a t a M I n s t r u c i o m e y u u W r i t e x D a t a x d a t a m e m o r y W r i t e R e g W r i t e d a t a 1 6 3 2 S i g n e x t d M e m R a d
Review: Data Transfer (branch) beq $s1, $s2, w_offset P C I n s t r u c i o m e y R a d 1 6 3 2 A L U l M x g W S h f 4 p Z D
Types of Datapath Datapaths can be divided into two categories Single cycle datapath: An instruction (e.g. lw) is performed in 1 clock cycle Multiple cycle datapath: An instruction is performed in multiple clock cycles Advantage of single datapath: Keeps the design simpler Disadvantage of single datapath: No datapath resource can be used more than once in a clock cycle Elements being accessed more than once in an instruction be duplicated or have multiple inputs and outputs. An instruction memory separate from data memory is required In section 4.4, a single cycle datapath is considered. We expand the discussion to multiple cycle datapaths later.
Control What needs to be added to complete the design of the single cycle datapath? Control Unit activates appropriate controls at ALU MUX’s (3) Read/write signals for register file Read/write signals for data memory Consider the design of the ALU Control Unit ALU Control Lines Result 0000 AND 0001 OR 0010 Add 0110 Subtract 0111 SLT 1100 NOR A L U R e s u l t Z r o O v f w a b p i n C y 4 1
Control: ALU Control Unit (1) Inputs to ALU Control: Function fields of instructions 2-bit control field (ALUOp) Output of ALU Control: 4-bit signal that controls the ALU 3 A L U Z e r o ALU result a b Control Instructions [5-0] ALUOp 5 4 2
Control: ALU Control Unit (2) I-format: op (31 – 26) rs (25 – 21) rt (20 – 16) Immediate / Offset (15 – 0) I-format instructions (lw/sw/beq) does not have a function field Use ALUOp to differentiate the operation from the rest e.g., ALUOp = 00 for lw/sw; ALUOp = 01 for beq R-format: op (31 – 26) rs (25 – 21) rt (20 – 16) rd (15 – 11) shamt (10 – 6) funct (5 – 0) R-format instructions (add/sub/and/or) does have a function field ALUOp = 10 for all instructions Function field is 32 for add, 34 for sub, 36 for AND, 37 for OR, and 42 for SLT
Control: ALU Control Unit (3) Instruction (opcode) Inputs Desired ALU action Outputs Operation (Op3 – Op0) ALUOp (ALUOp1 – ALUOp0) Function Field (F5 – F0) lw (I) 0 0 X X X X X X add 0 0 1 0 sw (I) beq (I) 0 1 sub 0 1 1 0 add (32) 1 0 1 0 0 0 0 0 sub (34) 1 0 0 0 1 0 and (36) 1 0 0 1 0 0 and 0 0 0 0 or (37) 1 0 0 1 0 1 or 0 0 0 1 slt (42) 1 0 1 0 1 0 slt 0 1 1 1 The first two rows of the truth table are the same. Number of inputs in the truth table are 6 while number of rows in the table are 8. Only 8 of the 26 = 32 possible combinations of input are being used. Some “DO NOT CARE” entries can be added to simplify the above truth table.
Control: ALU Control Unit (4) Instruction (opcode) Inputs Desired ALU action Outputs Operation (Op3 – Op0) ALUOp (ALUOp1 – ALUOp0) Function Field (F5 – F0) lw (I) 0 0 (0 0) X X X X X X add 0 0 1 0 sw (I) beq (I) 0 1 (0 1) sub 0 1 1 0 add (32) 1 0 (1 0) X X 0 0 0 0 sub (34) 1 X (1 0) X X 0 0 1 0 and (36) X X 0 1 0 0 and 0 0 0 0 or (37) X X 0 1 0 1 or 0 0 0 1 slt (42) 1 X (1 1) X X 1 0 1 0 slt 0 1 1 1 ALUOp does not use encoding 11, so, 10 and 11 may be replaced with 1X. The first two fields of the function fields are always 10, so, they are replaced by XX.
Control: ALU Control Unit (4) Truth Table for Operation2 = 1: ALUOp Function code fields ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 1 X X(0) Simplified Expressions: The truth table above shows the input combinations for which the ALU Control should be 0010, 0001, 0110, or 0111 (the other combinations are not used).
Control: ALU Control Unit (4) Truth Table for Operation1 = 1: ALUOp Function code fields ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 X X(1) X(0) Simplified Expressions:
Control: ALU Control Unit (4) Truth Table for Operation0 = 1: ALUOp Function code fields ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 1 X Simplified Expressions:
Control: ALU Control Unit (5) Combinational Circuit for the ALU Control Unit O p e r a t i o n 2 1 A L U F 3 ( 5 – ) c l b k
Main Control (1) Opcode is contained in bits 31 – 26. R-format: op (31 – 26) rs (25 – 21) rt (20 – 16) rd (15 – 11) shamt (10 – 6) funct (5 – 0) I-format: op (31 – 26) rs (25 – 21) rt (20 – 16) Immediate / Offset (15 – 0) Opcode is contained in bits 31 – 26. Registers specified by rs (bits 25 – 21) and rt (bits 20 – 16) are always read Base register (w/ base address) for lw/sw instruction is specified by rs (bits 25–21) 16-bit offset for beq, lw, and sw is always specified in bits 15 – 0. Destination register is specified in one of the two places: For R-type instructions (add/sub/and/or), destination register is specified by bits (15 – 11) For lw instruction, destination register is specified by bits (20 – 16) Using information (1 – 5), we can add the instruction labels and additional MUX’s to the datapath that we have constructed.
Main Control (2) M e m t o R g a d W r i A L U O p S c D s P C I n u y [ 3 1 – ] 2 6 5 4 x l Z h f Number of control lines is 10 (4 for MUX’s, 3 for ALU control, 2 for data memory, 1 for register file)
Main Control (3) Control Input Effect when Deasserted (0) Effect when asserted (1) RegDst Destination register number comes from bits 20 – 16 of the instruction (sw) Destination register number comes from bits 15 – 11 of the instruction (add,sub,or,and,slt) Regwrite None Data on the “write data” input is written on the register specified on the “write register” input (lw, add,sub,or,and) ALUSrc Second operand to ALU comes from the second register file output (add,sub,or,and,beq,slt) Second operand to ALU is sign extended, lower 16 bits of instruction(lw,sw) MemRead Data from memory location specified by “address” input is placed on the “read data” output (lw) MemWrite Data from “write data” input replaces memory location specified by “address” input (sw) MemtoReg Data from the output of ALU is fed into “write data” input of the register file (add,sub,or,and,slt) Data from the “read data” output of data memory is fed into “write data” input of the register file (lw) PCSrc PC is replaced by the output of adder which adds 4 to the existing content of PC (except for beq) PC is replaced by the output of adder which computes branch target by adding existing content of PC with 2-bit right shifted offset (beq) Next step in the design of datapath is to add a control unit that generates the control inputs to MUX’s
Main Control (4) P C I n s t r u c i o m e y R a d [ 3 1 – ] 2 6 5 A M ] 2 6 5 A M g L U O p W B h D S 4 x l Z f
Main Control (5) Inputs of Control Unit: Outputs of Control Unit: Instruction Opcode in Decimal Opcode in Binary Op5 Op4 Op3 Op2 Op1 Op0 R-format 0ten lw 35ten 1 sw 43ten beq 4ten Outputs of Control Unit: Instruction RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOp0 R-format 1 lw sw X beq which constitutes the truth table
Main Control (6) R - f o r m a t I w s b e q O p 1 2 3 4 5 n u g D A L 1 2 3 4 5 n u g D A L U S c M W i d B h
Example: R-type Instruction (step1: fetch instruction & increment PC) d [ 3 1 – ] 2 6 5 A M g L U O p W B h D S 4 x l f Z Step 1 1. Fetch instruction 2. Add 4 to the value of PC
Example: R-type Instruction (step2: Read two source registers) [ 3 1 – ] 2 6 5 A M g L U O p W B h D S 4 x l f Z Step 2: 1. Source registers are read from the register file 2. Control unit uses the opcode to activate units. ALUOp = 10 set based on opcode
Example: R-type Instruction (step3: ALU operates on operands) [ 3 1 – ] 2 6 5 A M g L U O p W B h D S 4 x l Z f Step 3: 1. ALUSrc = 0 2. Control line values of ALU set by ALU-control 2. ALU performs desired operation on input data
Example: R-type Instruction (step 4: Write result in destination register) [ 3 1 – ] 2 6 5 A M g L U O p W B h D S 4 x l f Z Step 4: 1. MemtoReg = 0; RegDst = 1; 2. Result of ALU is written in specified register 3. PC is incremented by 4
Why single-cycle implementation is not used? Assuming no delay at adder, sign extension unit, shift left unit, PC, control unit, and MUX: Load cycle requires 5 functional units: instruction fetch, register access, ALU, data memory access, register access Store cycle requires 4 functional units: instruction fetch, register access, ALU, data memory access R-type instruction cycle requires 4 functional units: instruction fetch, register access, ALU, register access Path for a branch instruction requires 3 functional units: instruction fetch, register access, ALU Path for a jump instruction requires 1 functional unit: instruction fetch Using a clock cycle of equal duration for each instruction is a waste of resources.