Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Slides:



Advertisements
Similar presentations
331 W08.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 8: Datapath Design [Adapted from Dave Patterson’s UCB CS152.
Advertisements

The Processor: Datapath & Control
Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.
Savio Chau Single Cycle Controller Design Last Time: Discussed the Designing of a Single Cycle Datapath Control Datapath Memory Processor (CPU) Input Output.
331 W9.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 9 Building a Single-Cycle Datapath [Adapted from Dave Patterson’s.
Levels in Processor Design
Lec 17 Nov 2 Chapter 4 – CPU design data path design control logic design single-cycle CPU performance limitations of single cycle CPU multi-cycle CPU.
331 Lec 14.1Fall 2002 Review: Abstract Implementation View  Split memory (Harvard) model - single cycle operation  Simplified to contain only the instructions:
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Computer Structure - Datapath and Control Goal: Design a Datapath  We will design the datapath of a processor that includes a subset of the MIPS instruction.
The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.
Chapter Five The Processor: Datapath and Control.
Shift Instructions (1/4)
Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Digital Architectures1 Machine instructions execution steps (1) FETCH = Read the instruction.
Processor I CPSC 321 Andreas Klappenecker. Midterm 1 Thursday, October 7, during the regular class time Covers all material up to that point History MIPS.
The Processor: Datapath & Control. Implementing Instructions Simplified instruction set memory-reference instructions: lw, sw arithmetic-logical instructions:
Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
Designing a Simple Datapath Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Revised 9/12/2013.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Computer Organization CS224 Fall 2012 Lesson 26. Summary of Control Signals addsuborilwswbeqj RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp.
Chapter 4 CSF 2009 The processor: Building the datapath.
Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original.
Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Processor: Datapath and Control
Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.
Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
ECE 445 – Computer Organization
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /19/2013 Lecture 17: The Processor - Overview Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER.
Datapath Design Computer Organization I 1 August 2009 © McQuain, Feng & Ribbens Composing the Elements First-cut data path does an instruction.
Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]
1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.
MIPS processor continued. In Class Exercise Question Show the datapath of a processor that supports only R-type and jr reg instructions.
Designing a Single- Cycle Processor 國立清華大學資訊工程學系 黃婷婷教授.
CPU Overview Computer Organization II 1 February 2009 © McQuain & Ribbens Introduction CPU performance factors – Instruction count n Determined.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor.
CS4100: 計算機結構 Designing a Single-Cycle Processor 國立清華大學資訊工程學系 一零零學年度第二學期.
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
Single Cycle Controller Design
MIPS Processor.
Morgan Kaufmann Publishers The Processor
May 22, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 14: A Simple Implementation of MIPS * Jeremy R. Johnson Mon. May 17, 2000.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
CS161 – Design and Architecture of Computer Systems
Single-Cycle Datapath and Control
Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Morgan Kaufmann Publishers
Introduction CPU performance factors
/ Computer Architecture and Design
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers
Processor Architecture: Introduction to RISC Datapath (MIPS and Nios II) CSCE 230.
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
MIPS processor continued
CSCI206 - Computer Organization & Programming
Single-Cycle CPU DataPath.
MIPS Processor.
Rocky K. C. Chang 6 November 2017
Composing the Elements
Composing the Elements
The Processor Lecture 3.2: Building a Datapath with Control
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
MIPS processor continued
The Processor: Datapath & Control.
MIPS Processor.
Processor: Datapath and Control
Presentation transcript:

Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili

(2) Reading Section Appendices C.7, C.8, C.11, D.2

(3) Introduction We will examine two MIPS implementations  A simplified version  this module  A more realistic pipelined version Simple subset, shows most aspects  Memory reference: lw, sw  Arithmetic/logical: add, sub, and, or, slt  Control transfer: beq, j

(4) Instruction Execution PC  instruction memory, fetch instruction Register numbers  register file, read registers Depending on instruction class 1.Use ALU to calculate oArithmetic result oMemory address for load/store oBranch target address 2.Access data memory for load/store 3.PC  An address or PC + 4 8d0b b ffff 1520fffc 000a082a ….. An Encoded Program Address

(5) Basic Ingredients Include the functional units we need for each instruction – combinational and sequential

(6) Sequential Elements (4.2, C.7, C.11) Register: stores data in a circuit  Uses a clock signal to determine when to update the stored value  Edge-triggered: update when Clk changes from 0 to 1 D Clk Q D Q falling edgerising edge

(7) Sequential Elements Register with write control  Only updates on clock edge when write control input is 1  Used when stored value is required later D Clk Q Write D Q Clk cycle time

(8) Clocking Methodology Combinational logic transforms data during clock cycles  Between clock edges  Input from state elements, output to state element  Longest delay determines clock period Synchronous vs. Asynchronous operation

(9) Built using D flip-flops (remember ECE 2030!) Register File (C.8)

(10) Register File Note: we still use the real clock to determine when to write

(11) Building a Datapath (4.3) Datapath  Elements that process data and addresses in the CPU oRegisters, ALUs, mux’s, memories, … We will build a MIPS datapath incrementally  Refining the overview design

(12) High Level Description Single instruction single data stream model of execution (Remember Flynn’s Taxonomy)  Serial execution model Fetch Instructions Execute Instructions Memory Operations Control

(13) Instruction Fetch Increment by 4 for next instruction 32-bit register clk cycle time Start instruction fetchComplete instruction fetch clk

(14) R-Format Instructions Read two register operands Perform arithmetic/logical operation Write register result op rs rt rdshamtfunct

(15) Executing R-Format Instructions ALU control RegWrite Write register Read data 1 Read data 2 Read register 1 Read register 2 Write data ALU result ALU Zero op rs rt rdshamtfunct

(16) Load/Store Instructions Read register operands Calculate address using 16-bit offset  Use ALU, but sign-extend offset Load: Read memory and update register Store: Write register value to memory oprsrt16-bit constant

(17) Executing I-Format Instructions 1632 Sign extend MemRead MemWrite Data memory Write data Read data Address RegWrite Read register 1 Read register 2 Write register oprsrt16-bit constant

(18) Branch Instructions Read register operands Compare operands  Use ALU, subtract and check Zero output Calculate target address  Sign-extend displacement  Shift left 2 places (word displacement)  Add to PC + 4 oAlready calculated by instruction fetch oprsrt16-bit constant

(19) Branch Instructions Just re-routes wires Sign-bit wire replicated

(20) Updating the Program Counter PC Instruction memory Read address Instruction [31–0] Instruction [20–16] Instruction [25–21] Add Instruction [15–0] Sign extend 1 M u x 0 Instruction [15–11 Shift Branch Add ALU result Computation of the branch address loop: beq $t0, $0, exit addi $t0, $t0, -1 lw $a0, arg1($t1) lw $a1, arg2($t2) jal func add $t3, $t3, $v0 addi $t1, $t1, 4 addi $t2, $t2, 4 j loop

(21) Composing the Elements First-cut data path does an instruction in one clock cycle  Each datapath element can only do one function at a time  Hence, we need separate instruction and data memories Use multiplexers where alternate data sources are used for different instructions

(22) Full Single Cycle Datapath

(23) ALU Control (4.4, D.2) ALU used for  Load/Store: Functon = add  Branch: Function = subtract  R-type: Function depends on funct field ALU controlFunction 0000AND 0001OR 0010add 0110subtract 0111set-on-less-than 1100NOR

(24) The Main Control Unit Control signals derived from instruction 0rsrtrdshamtfunct 31:265:025:2120:1615:1110:6 35 or 43rsrtaddress 31:2625:2120:1615:0 4rsrtaddress 31:2625:2120:1615:0 R-type Load/ Store Branch opcodealways read read, except for load write for R-type and load sign-extend and add

(25) ALU Control Assume 2-bit ALUOp derived from opcode  Combinational logic derives ALU control opcodeALUOpOperationfunctALU functionALU control lw00load wordXXXXXXadd0010 sw00store wordXXXXXXadd0010 beq01branch equalXXXXXXsubtract0110 R-type10add100000add0010 subtract100010subtract0110 AND100100AND0000 OR100101OR0001 set-on-less-than101010set-on-less-than0111 How do we turn this description into gates?

(26) ALU Controller inst[5:0] Generated from Decoding inst[31:26] ALU control ALU result ALU Zero 3 add sub add sub and or slt lw/sw beq arith ALU control ALUOp funct = inst[5:0]

(27) ALU Control Simple combinational logic (truth tables)

(28) Datapath With Control (4.5) Use rt not rd InstructionRegDstALUSrc Memto- Reg Write Mem Read Mem WriteBranchALUOp1ALUp0 R-format lw sw X1X beq X0X000101

(29) Commodity Processors ARM 7 Single Cycle Datapath

(30) Control Unit Signals To harness the datapath Inst[31:26] InstructionRegDstALUSrc Memto- Reg Write Mem Read Mem WriteBranchALUOp1ALUp0 R-format lw sw X1X beq X0X000101

(31) Controller Implementation LIBRARY IEEE; USE IEEE.STD_LOGIC_1164.ALL; USE IEEE.STD_LOGIC_ARITH.ALL; USE IEEE.STD_LOGIC_SIGNED.ALL; ENTITY control IS PORT( SIGNAL Opcode : IN STD_LOGIC_VECTOR( 5 DOWNTO 0 ); SIGNAL RegDst : OUT STD_LOGIC; SIGNAL ALUSrc : OUT STD_LOGIC; SIGNAL MemtoReg : OUT STD_LOGIC; SIGNAL RegWrite : OUT STD_LOGIC; SIGNAL MemRead : OUT STD_LOGIC; SIGNAL MemWrite : OUT STD_LOGIC; SIGNAL Branch : OUT STD_LOGIC; SIGNAL ALUop : OUT STD_LOGIC_VECTOR( 1 DOWNTO 0 ); SIGNAL clock, reset: IN STD_LOGIC ); END control;

(32) Controller Implementation (cont.) ARCHITECTURE behavior OF control IS SIGNAL R_format, Lw, Sw, Beq : STD_LOGIC; BEGIN -- Code to generate control signals using opcode bits R_format <= '1' WHEN Opcode = "000000" ELSE '0'; Lw <= '1' WHEN Opcode = "100011" ELSE '0'; Sw <= '1' WHEN Opcode = "101011" ELSE '0'; Beq <= '1' WHEN Opcode = "000100" ELSE '0'; RegDst <= R_format; ALUSrc <= Lw OR Sw; MemtoReg <= Lw; RegWrite <= R_format OR Lw; MemRead <= Lw; MemWrite <= Sw; Branch <= Beq; ALUOp( 1 ) <= R_format; ALUOp( 0 ) <= Beq; END behavior; Implementation of each table column

(33) R-Type Instruction

(34) Load Instruction

(35) Branch-on-Equal Instruction

(36) Implementing Jumps Jump uses word address Update PC with concatenation of  Top 4 bits of old PC  26-bit jump address  00 Need an extra control signal decoded from opcode 2address 31:2625: 0 Jump

(37) Datapath With Jumps Added

(38) Energy Behavior combinational activity storage read/write access

(39) A Simple Architecture Energy Model To a first order, we can use the per-access energy of each major component  Obtain this for a technology generation Use this per-access energy to compute the energy of each instruction Note:  This is a high level approximation. The actual physics is more complicated.  However, this useful for several purposes What components do each instruction exercise?

(40) Example: Updating the PC MemtoReg MemRead MemWrite ALUOp ALUSrc RegDst PC Instruction memory Read address Instruction [31–0] Instruction [20–16] Instruction [25–21] Add Instruction [5–0] RegWrite Instruction [15–0] 0 Registers Write register Write data Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend ALU result Zero Data memory Address Read data M u x 1 1 M u x 0 1 M u x 0 1 M u x 0 Instruction [15–11] ALU control Shift left 2 ALU Add ALU result Branch What is the energy cost of this operation?

(41) Example: Register Instructions MemtoReg MemRead MemWrite ALUOp ALUSrc RegDst PC Instruction memory Read address Instruction [31–0] Instruction [20–16] Instruction [25–21] Add Instruction [5–0] RegWrite Instruction [15–0] 0 Registers Write register Write data Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend ALU result Zero Data memory Address Read data M u x 1 1 M u x 0 1 M u x 0 1 M u x 0 Instruction [15–11] ALU control Shift left 2 ALU Add ALU result Branch What is the energy cost of this operation?

(42) Example: I-type Instructions MemtoReg MemRead MemWrite ALUOp ALUSrc RegDst PC Instruction memory Read address Instruction [31–0] Instruction [20–16] Instruction [25–21] Add Instruction [5–0] RegWrite Instruction [15–0] 0 Registers Write register Write data Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend ALU result Zero Data memory Address Read data M u x 1 1 M u x 0 1 M u x 0 1 M u x 0 Instruction [15–11] ALU control Shift left 2 ALU Add ALU result Branch What is the energy cost of this operation?

(43) Example: I-Type for Branches What is the energy cost of this operation?

(44) Converting Energy to Power For this data path, except for data memory, all components are active every cycle, and dissipating energy on every cycle  Later we will see how data paths can be made more energy efficient Computing power  Compute the total energy consumed over all cycles (instructions)  Divide energy by time to get power in watts Example:

(45) ITRS Roadmap for Logic Devices From: “ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems,” P. Kogge, et.al, 2008

(46) All of the logic is combinational We wait for everything to settle down, and the right thing to be done  ALU might not produce “right answer” right away  we use write signals along with clock to determine when to write Cycle time determined by length of the longest path Our Simple Control Structure We are ignoring some details like setup and hold times

(47) Performance Issues Longest delay determines clock period  Critical path: load instruction  Instruction memory  register file  ALU  data memory  register file Not feasible to vary period for different instructions Violates design principle  Making the common case fast We will improve performance by pipelining

(48) Summary Single cycle datapath  All instructions execute in one clock cycle  Not all instructions take the same amount of time  Software sees a simple interface  Can memory operations really take one cycle? Improve performance via pipelining, multi- cycle operation, parallelism or customization We will address these next

(49) Study Guide Given an instruction, be able to specify the values of all control signals required to execute that instruction Add new instructions: modify the datapath and control to affect its execution  E.g., jal, jr, shift, etc.  Modify the VHDL controller Given delays of various components, determine the cycle time of the datapath Distinguish between those parts of the datapath that are unique to each instruction and those components that are shared across all instructions

(50) Study Guide (cont.) Given a set of control signal values determine what operation the datapath performs Given the per access energies of each component:  Compute the energy required of any instruction  Given a program and clock rate compute the power dissipation of the datapath

(51) Glossary Asynchronous Clock Controller Critical path Flip Flop ITRS Roadmap Per-access energy Program counter Register Synchronous