Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Slides:



Advertisements
Similar presentations
331 W08.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 8: Datapath Design [Adapted from Dave Patterson’s UCB CS152.
Advertisements

The Processor: Datapath & Control
1  1998 Morgan Kaufmann Publishers Chapter Five The Processor: Datapath and Control.
Lec 17 Nov 2 Chapter 4 – CPU design data path design control logic design single-cycle CPU performance limitations of single cycle CPU multi-cycle CPU.
Computer Structure - Datapath and Control Goal: Design a Datapath  We will design the datapath of a processor that includes a subset of the MIPS instruction.
The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.
Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Digital Architectures1 Machine instructions execution steps (1) FETCH = Read the instruction.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Chapter 4 CSF 2009 The processor: Building the datapath.
Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Processor: Datapath and Control
Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.
Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
ECE 445 – Computer Organization
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /19/2013 Lecture 17: The Processor - Overview Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER.
Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]
1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.
Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CPU Overview Computer Organization II 1 February 2009 © McQuain & Ribbens Introduction CPU performance factors – Instruction count n Determined.
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
Morgan Kaufmann Publishers The Processor
May 22, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 14: A Simple Implementation of MIPS * Jeremy R. Johnson Mon. May 17, 2000.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
EE204 Computer Architecture
CS Computer Architecture Week 10: Single Cycle Implementation
CS161 – Design and Architecture of Computer Systems
Single-Cycle Datapath and Control
Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Computer Architecture
Morgan Kaufmann Publishers
Introduction CPU performance factors
/ Computer Architecture and Design
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers
Processor Architecture: Introduction to RISC Datapath (MIPS and Nios II) CSCE 230.
Morgan Kaufmann Publishers The Processor
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
MIPS processor continued
Morgan Kaufmann Publishers The Processor
CSCI206 - Computer Organization & Programming
Single-Cycle CPU DataPath.
CS/COE0447 Computer Organization & Assembly Language
CSCI206 - Computer Organization & Programming
MIPS Processor.
Levels in Processor Design
Morgan Kaufmann Publishers The Processor
Topic 5: Processor Architecture Implementation Methodology
Rocky K. C. Chang 6 November 2017
Composing the Elements
Composing the Elements
The Processor Lecture 3.2: Building a Datapath with Control
Vishwani D. Agrawal James J. Danaher Professor
Topic 5: Processor Architecture
Systems Architecture I
COMS 361 Computer Organization
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Lecture 14: Single Cycle MIPS Processor
Simple Implementation
Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
MIPS processor continued
CS/COE0447 Computer Organization & Assembly Language
The Processor: Datapath & Control.
COMS 361 Computer Organization
MIPS Processor.
Processor: Datapath and Control
CS/COE0447 Computer Organization & Assembly Language
Presentation transcript:

Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili

Reading Section 4.1-4.4 Appendices B.3, B.7, B.8, B.11, D.2 Note: Appendices A-E in the hardcopy text correspond to chapters 7-11 in the online text. Practice Problems: 1, 4, 6, 9

Morgan Kaufmann Publishers 4 April, 2019 Introduction We will examine two MIPS implementations A simplified version  this module A more realistic pipelined version Simple subset, shows most aspects Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j Chapter 4 — The Processor

Instruction Execution Morgan Kaufmann Publishers 4 April, 2019 Instruction Execution PC  instruction memory, fetch instruction Register numbers  register file, read registers Depending on instruction class Use ALU to calculate Arithmetic result Memory address for load/store Branch target address Access data memory for load/store PC  An address or PC + 4 8d0b0000   014b5020   21080004   2129ffff   1520fffc   000a082a   ….. …..  An Encoded Program Address Chapter 4 — The Processor

Basic Ingredients Include the functional units we need for each instruction – combinational and sequential A L U c o n t r l R e g W i s a d 1 2 u D m b . Z 5 3

Sequential Elements (4.2, B.7, B.11) Morgan Kaufmann Publishers 4 April, 2019 Sequential Elements (4.2, B.7, B.11) Register: stores data in a circuit Uses a clock signal to determine when to update the stored value Edge-triggered: update when Clk changes from 0 to 1 D Clk Q falling edge rising edge Clk D Q c Chapter 4 — The Processor

Morgan Kaufmann Publishers 4 April, 2019 Sequential Elements Register with write control Only updates on clock edge when write control input is 1 Used when stored value is required later cycle time D Clk Q Write Write D Q Clk c Chapter 4 — The Processor

Morgan Kaufmann Publishers 4 April, 2019 Clocking Methodology Combinational logic transforms data during clock cycles Between clock edges Input from state elements, output to state element Longest delay determines clock period Synchronous vs. Asynchronous operation Recall: Critical Path Delay Chapter 4 — The Processor

Register File (B.8) Built using D flip-flops (remember ECE 2020!)

Register File Note: we still use the real clock to determine when to write

Morgan Kaufmann Publishers 4 April, 2019 Building a Datapath (4.3) Datapath Elements that process data and addresses in the CPU Registers, ALUs, mux’s, memories, … We will build a MIPS datapath incrementally Refining the overview design Chapter 4 — The Processor

High Level Description Control Fetch Instructions Execute Instructions Memory Operations Single instruction single data stream model of execution Serial execution model Commonly known as the von Neumann execution model Stored program model Instructions and data share memory

Morgan Kaufmann Publishers 4 April, 2019 Instruction Fetch Increment by 4 for next instruction clk 32-bit register Start instruction fetch cycle time Complete instruction fetch clk Chapter 4 — The Processor

R-Format Instructions Morgan Kaufmann Publishers 4 April, 2019 R-Format Instructions Read two register operands Perform arithmetic/logical operation Write register result op rs rt rd shamt funct Chapter 4 — The Processor

Executing R-Format Instructions 3 A L U c o n t r o l 5 R e a d r e g i s t e r 1 R e a d 5 d a t a 1 R e a d r e g i s t e r 2 Z e r o A L U A L U 5 W r i t e r e s u l t r e g i s t e r R e a d d a t a 2 W r i t e d a t a R e g W r i t e op rs rt rd shamt funct

Load/Store Instructions Morgan Kaufmann Publishers 4 April, 2019 Load/Store Instructions Read register operands Calculate address using 16-bit offset Use ALU, but sign-extend offset Load: Read memory and update register Store: Write register value to memory op rs rt 16-bit constant Chapter 4 — The Processor

Executing I-Format Instructions d r g i s t 1 M e m W r i t e R e a d r e g i s t e r 2 A d d r e s s R e a d W r i t e d a t a r e g i s t e r W r i t e D a t a d a t a m e m o r y R e g W r i t e 1 6 3 2 S i g n M e m R e a d e x t e n d op rs rt 16-bit constant

Morgan Kaufmann Publishers 4 April, 2019 Branch Instructions Read register operands Compare operands Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4 Already calculated by instruction fetch op rs rt 16-bit constant Chapter 4 — The Processor

Morgan Kaufmann Publishers 4 April, 2019 Branch Instructions Just re-routes wires Sign-bit wire replicated op rs rt 16-bit constant Chapter 4 — The Processor

Updating the Program Counter Branch A d d M u x Computation of the branch address 4 A L U A d d 1 r e s u l t S h i f t I n s t r u c t i o n [ 2 5 – 2 1 ] P C R e a d a d d r e s s I n s t r u c t i o n [ 2 – 1 6 ] I n s t r u c t i o n [ 3 1 – ] loop: beq $t0, $0, exit addi $t0, $t0, -1 lw $a0, arg1($t1) lw $a1, arg2($t2) jal func add $t3, $t3, $v0 addi $t1, $t1, 4 addi $t2, $t2, 4 j loop I n s t r u c t i o n I n s t r u c t i o n [ 1 5 – 1 1 m e m o r y I n s t r u c t i o n [ 1 5 – ] 1 6 S i g n 3 2 e x t e n d

Composing the Elements Morgan Kaufmann Publishers 4 April, 2019 Composing the Elements First-cut data path does an instruction in one clock cycle Each datapath element can only do one function at a time Hence, we need separate instruction and data memories Use multiplexers where alternate data sources are used for different instructions 014b5020   21080004   2129ffff   1520fffc   000a082a   ….. …..  An Encoded Program PC Address Chapter 4 — The Processor

Full Single Cycle Datapath Morgan Kaufmann Publishers 4 April, 2019 Full Single Cycle Datapath Destination register is “instruction-specific” lw$t0, 0($t4) vs. add $t0m $t1, $t2 Chapter 4 — The Processor

Morgan Kaufmann Publishers 4 April, 2019 ALU Control (4.4, D.2) ALU used for Load/Store: Function = add Branch: Function = subtract R-type: Function depends on func field ALU control Function 000 AND 001 OR 010 add 110 subtract 111 set-on-less-than Chapter 4 — The Processor

Morgan Kaufmann Publishers 4 April, 2019 ALU Control Assume 2-bit ALUOp derived from opcode Combinational logic derives ALU control don’t care opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 010 sw store word beq 01 branch equal subtract 110 R-type 10 100000 100010 AND 100100 000 OR 100101 001 set-on-less-than 101010 111 How do we turn this description into gates? Chapter 4 — The Processor

ALU Controller ALUOp funct = inst[5:0] lw/sw add beq sub add sub arith Funct field ALU Control ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 X 010 1 110 000 001 111 funct = inst[5:0] lw/sw add ALU control beq sub add A L U c o n t r l e s u Z 3 sub arith and or slt Generated from Decoding inst[31:26] inst[5:0]

ALU Control Simple combinational logic (truth tables)

Morgan Kaufmann Publishers 4 April, 2019 The Main Control Unit Control signals derived from instruction R-type rs rt rd shamt funct 31:26 5:0 25:21 20:16 15:11 10:6 Load/ Store 35 or 43 rs rt address 31:26 25:21 20:16 15:0 4 rs rt address 31:26 25:21 20:16 15:0 Branch opcode always read read, except for load write for R-type and load sign-extend and add Chapter 4 — The Processor

Morgan Kaufmann Publishers 4 April, 2019 Datapath With Control Instruction RegDst ALUSrc Memto- Reg Write Mem Read Branch ALUOp1 ALUp0 R-format 1 lw sw X beq Use rt not rd Chapter 4 — The Processor

Commodity Processors ARM 7 Single Cycle Datapath

Control Unit Signals To harness the datapath Inst[31:26] Instruction RegDst ALUSrc Memto- Reg Write Mem Read Branch ALUOp1 ALUp0 R-format 1 lw sw X beq Inst[31:26] Adding a new instruction? To harness the datapath Programmable logic array (PLA) implementation (B.3)

Controller Implementation LIBRARY IEEE; USE IEEE.STD_LOGIC_1164.ALL; USE IEEE.STD_LOGIC_ARITH.ALL; USE IEEE.STD_LOGIC_SIGNED.ALL; ENTITY control IS PORT( SIGNAL Opcode : IN STD_LOGIC_VECTOR( 5 DOWNTO 0 ); SIGNAL RegDst : OUT STD_LOGIC; SIGNAL ALUSrc : OUT STD_LOGIC; SIGNAL MemtoReg : OUT STD_LOGIC; SIGNAL RegWrite : OUT STD_LOGIC; SIGNAL MemRead : OUT STD_LOGIC; SIGNAL MemWrite : OUT STD_LOGIC; SIGNAL Branch : OUT STD_LOGIC; SIGNAL ALUop : OUT STD_LOGIC_VECTOR( 1 DOWNTO 0 ); SIGNAL clock, reset : IN STD_LOGIC ); END control;

Controller Implementation (cont.) ARCHITECTURE behavior OF control IS SIGNAL R_format, Lw, Sw, Beq : STD_LOGIC; BEGIN -- Code to generate control signals using opcode bits R_format <= '1' WHEN Opcode = "000000" ELSE '0'; Lw <= '1' WHEN Opcode = "100011" ELSE '0'; Sw <= '1' WHEN Opcode = "101011" ELSE '0'; Beq <= '1' WHEN Opcode = "000100" ELSE '0'; RegDst <= R_format; ALUSrc <= Lw OR Sw; MemtoReg <= Lw; RegWrite <= R_format OR Lw; MemRead <= Lw; MemWrite <= Sw; Branch <= Beq; ALUOp( 1 ) <= R_format; ALUOp( 0 ) <= Beq; END behavior; Implementation of each table column Instruction RegDst ALUSrc Memto- Reg Write Mem Read Branch ALUOp1 ALUp0 R-format 1 lw sw X beq

Morgan Kaufmann Publishers 4 April, 2019 R-Type Instruction Chapter 4 — The Processor

Morgan Kaufmann Publishers 4 April, 2019 Load Instruction Chapter 4 — The Processor

Branch-on-Equal Instruction Morgan Kaufmann Publishers 4 April, 2019 Branch-on-Equal Instruction Chapter 4 — The Processor

Morgan Kaufmann Publishers 4 April, 2019 Implementing Jumps 2 address 31:26 25:0 Jump Jump uses word address Update PC with concatenation of Top 4 bits of old PC 26-bit jump address 00 Need an extra control signal decoded from opcode Chapter 4 — The Processor

Datapath With Jumps Added Morgan Kaufmann Publishers 4 April, 2019 Datapath With Jumps Added clk Chapter 4 — The Processor

Example: ARM Cortex M3 Fitbit Flex ARM Processor Blue Tooth IC zembedded.com ARM Processor Blue Tooth IC www.ifixit.com

Our Simple Control Structure All of the logic is combinational We wait for everything to settle down, and the right thing to be done ALU might not produce “right answer” right away we use write signals along with clock to determine when to write Cycle time determined by length of the longest path We are ignoring some details like setup and hold times

Morgan Kaufmann Publishers 4 April, 2019 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory  register file  ALU  data memory  register file Not feasible to vary period for different instructions Violates design principle Making the common case fast We will improve performance by pipelining Chapter 4 — The Processor

Summary Single cycle datapath All instructions execute in one clock cycle Not all instructions take the same amount of time Software sees a simple interface Can memory operations really take one cycle? Improve performance via pipelining, multi- cycle operation, parallelism or customization We will address these next

Study Guide Given an instruction, be able to specify the values of all control signals required to execute that instruction Add new instructions: modify the datapath and control to affect its execution Modify the dataflow in support, e.g., jal, jr, shift, etc. Modify the VHDL controller Given delays of various components, determine the cycle time of the datapath Distinguish between those parts of the datapath that are unique to each instruction and those components that are shared across all instructions

Study Guide (cont.) Given a set of control signal values determine what operation the datapath performs Know the bit width of each signal in the datapath Add support for procedure calls – jal instruction

Glossary Asynchronous Clock Controller Critical path Cycle Time Dataflow Flip Flop Program Counter Register File Sign Extension Synchronous