Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original.

Slides:



Advertisements
Similar presentations
The Processor: Datapath & Control
Advertisements

Lec 17 Nov 2 Chapter 4 – CPU design data path design control logic design single-cycle CPU performance limitations of single cycle CPU multi-cycle CPU.
331 Lec 14.1Fall 2002 Review: Abstract Implementation View  Split memory (Harvard) model - single cycle operation  Simplified to contain only the instructions:
Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Computer Structure - Datapath and Control Goal: Design a Datapath  We will design the datapath of a processor that includes a subset of the MIPS instruction.
The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.
Shift Instructions (1/4)
Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Digital Architectures1 Machine instructions execution steps (1) FETCH = Read the instruction.
Morgan Kaufmann Publishers Dr. Zhao Zhang Iowa State University
The Processor: Datapath & Control. Implementing Instructions Simplified instruction set memory-reference instructions: lw, sw arithmetic-logical instructions:
Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
Designing a Simple Datapath Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Revised 9/12/2013.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Computer Organization CS224 Fall 2012 Lesson 26. Summary of Control Signals addsuborilwswbeqj RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp.
Chapter 4 CSF 2009 The processor: Building the datapath.
Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,
Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Processor: Datapath and Control
Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.
Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
ECE 445 – Computer Organization
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /19/2013 Lecture 17: The Processor - Overview Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER.
Datapath Design Computer Organization I 1 August 2009 © McQuain, Feng & Ribbens Composing the Elements First-cut data path does an instruction.
CS2100 Computer Organisation The Processor: Datapath (AY2015/6) Semester 1.
Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]
1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.
MIPS processor continued. In Class Exercise Question Show the datapath of a processor that supports only R-type and jr reg instructions.
December 26, 2015©2003 Craig Zilles (derived from slides by Howard Huang) 1 A single-cycle MIPS processor  As previously discussed, an instruction set.
Designing a Single- Cycle Processor 國立清華大學資訊工程學系 黃婷婷教授.
Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CPU Overview Computer Organization II 1 February 2009 © McQuain & Ribbens Introduction CPU performance factors – Instruction count n Determined.
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
February 22, 2016©2003 Craig Zilles (derived from slides by Howard Huang) 1 A single-cycle MIPS processor  As previously discussed, an instruction set.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor.
CS4100: 計算機結構 Designing a Single-Cycle Processor 國立清華大學資訊工程學系 一零零學年度第二學期.
MIPS processor continued
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
1 Chapter 5: Datapath and Control (Part 2) CS 447 Jason Bakos.
Morgan Kaufmann Publishers The Processor
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
CS161 – Design and Architecture of Computer Systems
CS 230: Computer Organization and Assembly Language
Morgan Kaufmann Publishers
Introduction CPU performance factors
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers
Processor Architecture: Introduction to RISC Datapath (MIPS and Nios II) CSCE 230.
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
MIPS processor continued
CSCI206 - Computer Organization & Programming
Single-Cycle CPU DataPath.
CSCI206 - Computer Organization & Programming
Morgan Kaufmann Publishers The Processor
Topic 5: Processor Architecture Implementation Methodology
Rocky K. C. Chang 6 November 2017
Composing the Elements
Composing the Elements
The Processor Lecture 3.2: Building a Datapath with Control
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Lecture 14: Single Cycle MIPS Processor
Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Computer Architecture Processor: Datapath
MIPS processor continued
The Processor: Datapath & Control.
COMS 361 Computer Organization
Processor: Datapath and Control
Presentation transcript:

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original slides provided by MKP

Week 8 Overview CPU design overview Datapath and Control Control Unit ALU Control Unit Chapter 1 — Computer Abstractions and Technology — 2

Announcements Mini-project B starts in week 9 Mini-projects B and C will be revised The grading scale will be discussed by Friday (week 8) Chapter 1 — Computer Abstractions and Technology — 3

Chapter 4 — The Processor — 4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified, single-cycle version A more realistic, pipelined version §4.1 Introduction

Nine-Instruction MIPS We will first use a MIPS subset of nine instructions, then extend the subset It’s enough to illustrate the most aspects of CPU design, particularly datapath and control design Memory reference: LW and SW Arithmetic/logic: ADD, SUB, AND, OR, SLT Branch: BEQ, BNE Chapter 1 — Computer Abstractions and Technology — 5

Chapter 4 — The Processor — 6 Instruction Execution PC  instruction memory, Fetch instruction Register numbers  register file, Read registers Then, depending on instruction class Execute: Use ALU to calculate Arithmetic result Memory address for load/store Branch target address Memory access: Access data memory for load/store Register writeback: Write data back to registers PC update (for all): PC  target address or PC + 4

Chapter 4 — The Processor — 7 CPU Overview Next Sequential PC = PC + 4 Branch Target = (PC+4)+offset A Sketchy view An instruction may change 1.PC (all instructions) 2.Some register (arithmetic/logic, load) 3.Some memory word/halfword/byte (store)

Chapter 4 — The Processor — 8 Multiplexers Can’t just join wires together Use multiplexers What would happen if you just join signals in VHDL?

Chapter 4 — The Processor — 9 Control Control signals: mux select, read/write enable, ALU opcode, etc.

Chapter 4 — The Processor — 10 Logic Design Basics §4.2 Logic Design Conventions Combinational element Operate on data Output is a function of input State (sequential) elements Store information Output is a function of internal state and input

Chapter 4 — The Processor — 11 Combinational Elements AND-gate Y = A & B A B Y I0 I1 Y MuxMux S Multiplexer Y = S ? I1 : I0 A B Y + A B Y ALU F Adder Y = A + B Arithmetic/Logic Unit Y = F(A, B)

Chapter 4 — The Processor — 12 Sequential Elements Register: stores data in a circuit Uses a clock signal to determine when to update the stored value Edge-triggered: update when Clk changes from 0 to 1 Data output Q is stable for a clock cycle D Clk Q D Q

Chapter 4 — The Processor — 13 Sequential Elements Register with write control Only updates on clock edge when write control input is 1 VHDL: rising_edge(Clk) AND Write Used when stored value is required later D Clk Q Write D Q Clk

Chapter 4 — The Processor — 14 Clocking Methodology Combinational logic transforms data during clock cycles Input from state elements Output must stabilize within one cycle Longest delay determines clock period Output to state element at the next rising edge

Clocking Methodology Processor is a big state machine Works like a Moore machine in non-I/O phase Output is a function of the state States include PC, all registers and memory contents Chapter 1 — Computer Abstractions and Technology — 15

Chapter 4 — The Processor — 16 Building a Datapath Datapath elements Elements that process data and addresses in the CPU Registers, ALUs, mux’s, memories, … We will build a MIPS datapath incrementally Refining the overview design §4.3 Building a Datapath

Chapter 4 — The Processor — 17 Instruction Fetch 32-bit register Increment by 4 for next instruction Datapath elements: PC register, instruction memory, 32-bit adder

Chapter 4 — The Processor — 18 R-Format Instructions Read two register operands Perform arithmetic/logical operation Write register result Datapath elements: Register file, ALU

Chapter 4 — The Processor — 19 Load/Store Instructions Read register operands Calculate address using 16-bit offset Use ALU, but sign-extend offset Load: Read memory and update register Store: Write register value to memory Datapath elements: Data memory, sign extender

Chapter 4 — The Processor — 20 Branch Instructions Read register operands Compare operands Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4 Already calculated by instruction fetch

Chapter 4 — The Processor — 21 Branch Instructions Just re-routes wires Sign-bit wire replicated New: Shifter, 2 nd 32-bit Adder

Chapter 4 — The Processor — 22 Composing the Elements First-cut data path does an instruction in one clock cycle Each datapath element can only do one function at a time Hence, we need separate instruction and data memories Use multiplexers where alternate data sources are used for different instructions

Chapter 4 — The Processor — 23 R-Type/Load/Store Datapath

Chapter 4 — The Processor — 24 Full Datapath

Chapter 4 — The Processor — 25 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory  register file  ALU  data memory  register file Not every instruction requires the same time

Chapter 4 — The Processor — 26 Performance Issues Some instructions may take substantially longer time, e.g. multiply/division Not feasible to vary clock cycle for different instructions Must use the worst-case delay as the clock cycle Violates design principle making the common case fast We will improve performance by pipelining

Chapter 4 — The Processor — 27 ALU Control ALU used for Load/Store: F = add Branch: F = subtract R-type: F depends on funct field §4.4 A Simple Implementation Scheme ALU controlFunction 0000AND 0001OR 0010add 0110subtract 0111set-on-less-than 1100NOR

Chapter 4 — The Processor — 28 ALU Control Assume 2-bit ALUOp derived from opcode Combinational logic derives ALU control opcodeALUOpOperationfunctALU functionALU control lw00load wordXXXXXXadd0010 sw00store wordXXXXXXadd0010 beq01branch equalXXXXXXsubtract0110 R-type10add100000add0010 subtract100010subtract0110 AND100100AND0000 OR100101OR0001 set-on-less-than101010set-on-less-than0111

VHDL Notes How to program the ALU control? -- Behavior style process (alu_op, funct) begin case alu_op is when ‘00’ => alu_code <= ‘0010’; when ’01’ => … end case; end process; Chapter 1 — Computer Abstractions and Technology — 29

Chapter 4 — The Processor — 30 The Main Control Unit Control signals derived from instruction 0rsrtrdshamtfunct 31:265:025:2120:1615:1110:6 35 or 43rsrtaddress 31:2625:2120:1615:0 4rsrtaddress 31:2625:2120:1615:0 R-type Load/ Store Branch opcodealways read read, except for load write for R-type and load sign-extend and add

Chapter 4 — The Processor — 31 Datapath With Control

Summary of Control Signals RegDst: Write to register rt or rd? ALUSrc: Immediate to ALU? MemtoReg: Write memory or ALU output? RegWrite: Write to regfile at all? MemRead: Read from Data Memory? MemWrite: Write to the Data Memory? Branch: Is it a branch intruction? ALUOp[1:0]: ALU control field Chapter 1 — Computer Abstractions and Technology — 32

Chapter 4 — The Processor — 33 R-Type Instruction

R-Type: Control Signals RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp[1:0] 1 (write to rd) 0 (No immediate) 0 (wrote not from memory) 1 (does write regfile) 0 (no memory read) 0 (no memory write) 0 (does write regfile) 10 (R-type ALU op) Chapter 1 — Computer Abstractions and Technology — 34

Chapter 4 — The Processor — 35 Load Instruction

Load: Control Signals RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp[1:0] Chapter 1 — Computer Abstractions and Technology — 36

Store: Control Signals RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp[1:0] X 1 X Chapter 1 — Computer Abstractions and Technology — 37

Chapter 4 — The Processor — 38 Branch-on-Equal Instruction

BEQ: Control Signals RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp[1:0] X 0 X Chapter 1 — Computer Abstractions and Technology — 39

Control Signal Setting What’re the control signal values for each instruction or instruction type? InstReg- Dst ALU- Src Mem- toReg Reg- Write Mem- Read Mem- Write Branc h ALUO p1 ALUO p0 R lw swX1X beqX0X Chapter 1 — Computer Abstractions and Technology — 40 Note: “R-” means R-format

VHDL Notes How to program the control? entity control is port (op_code : in m32_6bits; reg_dst : out m32_1bit; alu_src : out m32_1bit; mem_to_reg : out m32_1bit; reg_write : out m32_1bit; mem_read : out m32_1bit; mem_write : out m32_1bit; branch : out m32_1bit; alu_op : out m32_2bits); end control; Chapter 1 — Computer Abstractions and Technology — 41

VHDL Notes architecture rom of control is subtype code_t is m32_vector(8 downto 0); type rom_t is array (0 to 63) of code_t; -- The ROM content for control signals signal rom : rom_t := ( 00 => " ", -- R-type 35 => " ", -- LW … -- More for other instructions others=>" "); begin (reg_dst, alu_src, mem_to_reg, reg_write, mem_read, mem_write, branch, alu_op(1), alu_op(0)) <= rom(to_integer(unsigned(op_code))); end rom; Chapter 1 — Computer Abstractions and Technology — 42

Chapter 4 — The Processor — 43 Implementing Jumps Jump uses word address Update PC with concatenation of Top 4 bits of old PC 26-bit jump address 00 Need an extra control signal decoded from opcode 2address 31:2625:0 Jump

Chapter 4 — The Processor — 44 Datapath With Jumps Added

Grading Scale Tentative grading scale A: 90, A-: 87 B+: 84, B: 80, B-: 75 C+: 70, C: 65, C-: 60 D: 50 There will be a bonus in lab projects Chapter 1 — Computer Abstractions and Technology — 45

Mini-Project B, Tentative Implement single-cycle processor (SCP). There will be three parts 1. Part 1, SCPv1: Implement the nine- instruction ISA plus the J instruction 2. Part 2, SCPv2a: Support all the instructions needed to run bubble sorting 3. Part 3, SCPv2b: Detailed modeling of data elements Chapter 1 — Computer Abstractions and Technology — 46

Mini-Project B Bonus part, SCPv3: Support all integer instructions on the green sheet, due in the last lab Some support files will be provided High-level modeling of Register File, ALU, Adder, to be used in Parts 1 and 2 Partial sample VHDL code will be provided Chapter 1 — Computer Abstractions and Technology — 47

Mini-Project B The CPU composition must be strongly structural Parts 1 and 2 may use behavior/dataflow modeling for data elements Part 3 must use detailed modeling for data elements – Reuse your VHDL code in the labs Chapter 1 — Computer Abstractions and Technology — 48

Extend Single-Cycle MIPS Consider the following instructions addi: add immediate sll: Shift left logic by a constant bne: branch if not equal jal: Jump and link jr: Jump register Chapter 1 — Computer Abstractions and Technology — 49

Chapter 4 — The Processor — 50 SCPv0: R-Format, LW/SW, BEQ

Chapter 4 — The Processor — 51 SCPv1: R-Format, LW/SW, BEQ, J

SCPv1: Control Signals What’re the control signal values for each instruction or instruction type? InstReg- Dst ALU- Src Mem- toReg Reg- Write Mem Read Mem Write Bran ch ALU Op1 ALU Op0 Jum p R lw swX1X beqX0X jXXX0000XX1 Chapter 1 — Computer Abstractions and Technology — 52 Note: “R-” means R-format

Extend the Single-Cycle Processor For each instruction, do we need 1. Any new or revised datapath element(s)? 2. Any new control signal(s)? Then revise, if necessary, 1. Datapath: Add new elements or revise existing ones, add new connections 2. Control Unit: Add/extend control signals, extend the truth table 3. ALU Control: Extend the truth table Chapter 1 — Computer Abstractions and Technology — 53

SCPv0 + ADDI addi rs, rt, immediate R[rt] = R[rs]+SignExtImm Read register operands (only one is used) Sign extend the immediate (in parallel) Perform arithmetic/logical operation Write register result Chapter 1 — Computer Abstractions and Technology — rsrtimmediate 31:2625:2120:1615:0

SCPv0 + ADDI Chapter 1 — Computer Abstractions and Technology — 55 What changes to this baseline?

Chapter 4 — The Processor — 56 SCPv0 + ADDI Do we need new or revised datapath elements?

SCPv0 + ADDI Do we need new or revised datapath elements? Do we need new control signal(s)? InstReg- Dst ALU- Src Mem- toReg Reg- Write Mem- Read Mem- Write Branc h ALUO p1 ALUO p0 R lw swX1X beqX0X addi Chapter 1 — Computer Abstractions and Technology — 57

SCPv0 + ADDI Like LW I-format instruction Write to register[rt] Use add operation InstReg- Dst ALU- Src Mem- toReg Reg- Write Mem- Read Mem- Write Branc h ALUO p1 ALUO p0 R lw swX1X beqX0X addi Chapter 1 — Computer Abstractions and Technology — 58 Like R-format arithmetic Write ALU result to register file

SCPv0 + SLL sll rd, rs, shamt R[rd] = R[rt]<<shamt Read register operands (only one is used) Perform shift operation Write register result Note: sllv rd, rt, rs for shift left logic variable Chapter 1 — Computer Abstractions and Technology — rsrtrdshamt :265:025:2120:1615:1110:6

SCPv0 + SLL Chapter 1 — Computer Abstractions and Technology — 60 What changes to the datapath elements?

SCPv0 + SLL Chapter 1 — Computer Abstractions and Technology — 61 ALU needs to do the shift operation ALU 1 st input needs another source: shamt extended to 32-bit

SCPv0 + SLL Add another source to the 1 st input of ALU Shamt: Instruction[10-6] Add a Mux and ALUSrc1 control line 0: R[rs] 1: Shamt (sign-extended) Rename ALUSrc to ALUSrc2 Extend ALU control Add an ALU control code for SLL Chapter 1 — Computer Abstractions and Technology — 62

Chapter 4 — The Processor — 63 SCPv0 + SLL Extend ALU control: Choose a code of your choice (kkkk shown in the table) opcodeALUOpOperationfunctALU functionALU control lw00load wordXXXXXXadd0010 sw00store wordXXXXXXadd0010 beq01branch equalXXXXXXsubtract0110 R-type10add100000add0010 subtract100010subtract0110 AND100100AND0000 OR100101OR0001 set-on-less-than101010set-on-less-than0111 shift-left-logic000000shift-left-logickkkk

SCPv0 + SLL InstReg- Dst ALU- Src Mem- toReg Reg- Write Mem- Read Mem- Write Branc h ALUO p R lw swX1X00100 beqX0X sll Chapter 1 — Computer Abstractions and Technology — 64 InstReg- Dst ALU Src1 ALU- Src2 Mem- toReg Reg- Write Mem - Read Mem - Write Bran ch ALU Op R lw swX01X00100 beqX00X sll

SCPv0 + BNE bne rs, rt, label PC = (R[Rs]!=R[rt]) ? PC+4+(SignExtImm<<2) : PC+4 Read register operands Compare operands Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4 Already calculated by instruction fetch Chapter 1 — Computer Abstractions and Technology — rsrtoffset 31:2625:2120:1615:0

Chapter 4 — The Processor — 66 SCPv0 + BNE Make what changes to the datapath?

SCPv0 + BNE Extend Branch to two bits 10: Branch-Equal 11: Branch-Not-Equal Replace the AND gate with the following logic Can use a different truth table Chapter 1 — Computer Abstractions and Technology — 67 BranchZeroBranch taken? otherwise0

SCPv0 + BNE InstReg- Dst ALU- Src Mem- toReg Reg- Write Mem- Read Mem- Write Branc h ALUO p R lw swX1X00100 beqX0X bne Chapter 1 — Computer Abstractions and Technology — 68 InstReg- Dst ALU- Src Mem- toReg Reg- Write Mem- Read Mem- Write Branc h ALUO p R lw swX1X0010 beqX0X bneX0X

SCPv1 + JAL jal target PC = JumpAddr R[31] = PC+4 Jump uses word address Update PC with JumpAddr: concatenation of top 4 bits of old PC, 26-bit jump address, and 00 (called pseudo-direct) Save PC+4 to $ra Chapter 1 — Computer Abstractions and Technology — address 31:2625:0

Chapter 4 — The Processor — 70 SCPv1 + JAL Make what changes to the datapath?