Single-Cycle DataPath

Slides:



Advertisements
Similar presentations
CS1104: Computer Organisation School of Computing National University of Singapore.
Advertisements

CMPT 334 Computer Organization
The Processor: Datapath & Control
CS61C L24 Introduction to CPU Design (1) Garcia, Spring 2007 © UCB Cell pic to web site  A new MS app lets people search the web based on a digital cell.
CS61C L18 Introduction to CPU Design (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture.
Lec 17 Nov 2 Chapter 4 – CPU design data path design control logic design single-cycle CPU performance limitations of single cycle CPU multi-cycle CPU.
Computer Structure - Datapath and Control Goal: Design a Datapath  We will design the datapath of a processor that includes a subset of the MIPS instruction.
The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.
CS61C L24 Introduction to CPU Design (1) Garcia, Fall 2006 © UCB Fedora Core 6 (FC6) just out  The latest version of the distro has been released; they.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 24 Introduction to CPU design Stanford researchers developing 3D camera.
Shift Instructions (1/4)
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 25 CPU design (of a single-cycle CPU) Intel is prototyping circuits that.
Processor I CPSC 321 Andreas Klappenecker. Midterm 1 Thursday, October 7, during the regular class time Covers all material up to that point History MIPS.
CS61C L20 Datapath © UC Regents 1 CS61C - Machine Structures Lecture 20 - Datapath November 8, 2000 David Patterson
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 27: Single-Cycle CPU Datapath Design Instructor: Sr Lecturer SOE Dan Garcia
CS3350B Computer Architecture Winter 2015 Lecture 5.6: Single-Cycle CPU: Datapath Control (Part 1) Marc Moreno Maza [Adapted.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Chapter 4 CSF 2009 The processor: Building the datapath.
Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,
Processor: Datapath and Control
Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
ECE 445 – Computer Organization
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /19/2013 Lecture 17: The Processor - Overview Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER.
CS2100 Computer Organisation The Processor: Datapath (AY2015/6) Semester 1.
CS61C L20 Datapath © UC Regents 1 Microprocessor James Tan Adapted from D. Patterson’s CS61C Copyright 2000.
CPU Overview Computer Organization II 1 February 2009 © McQuain & Ribbens Introduction CPU performance factors – Instruction count n Determined.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
New-School Machine Structures Parallel Requests Assigned to computer e.g., Search “Katz” Parallel Threads Assigned to core e.g., Lookup, Ads Parallel Instructions.
CS 61C: Great Ideas in Computer Architecture MIPS Datapath 1 Instructors: Nicholas Weaver & Vladimir Stojanovic
CPU Design - Datapath. Review Use muxes to select among input S input bits selects 2 S inputs Each input can be n-bits wide, indep of S Can implement.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
CS161 – Design and Architecture of Computer Systems
Single-Cycle Datapath and Control
Morgan Kaufmann Publishers
IT 251 Computer Organization and Architecture
Introduction CPU performance factors
/ Computer Architecture and Design
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers
Processor Architecture: Introduction to RISC Datapath (MIPS and Nios II) CSCE 230.
Morgan Kaufmann Publishers The Processor
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
Morgan Kaufmann Publishers The Processor
CSCI206 - Computer Organization & Programming
Single-Cycle CPU DataPath.
Morgan Kaufmann Publishers The Processor
Topic 5: Processor Architecture Implementation Methodology
Rocky K. C. Chang 6 November 2017
Composing the Elements
Composing the Elements
The Processor Lecture 3.2: Building a Datapath with Control
The Processor Lecture 3.1: Introduction & Logic Design Conventions
Guest Lecturer TA: Shreyas Chand
Topic 5: Processor Architecture
COMS 361 Computer Organization
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Instructor Paul Pearce
Lecture 14: Single Cycle MIPS Processor
Processor: Multi-Cycle Datapath & Control
Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Computer Architecture Processor: Datapath
Chapter 7 Microarchitecture
Chapter 7 Microarchitecture
The Processor: Datapath & Control.
COMS 361 Computer Organization
Processor: Datapath and Control
Presentation transcript:

Single-Cycle DataPath Lecture 15 CDA 3103 07-09-2014

Review of Virtual Memory Next level in the memory hierarchy Provides illusion of very large main memory Working set of “pages” residing in main memory (subset of all pages residing on disk) Main goal: Avoid reaching all the way back to disk as much as possible Additional goals: Let OS share memory among many programs and protect them from each other Each process thinks it has all the memory to itself

Review: Paging Terminology Programs use virtual addresses (VAs) Space of all virtual addresses called virtual memory (VM) Divided into pages indexed by virtual page number (VPN) Main memory indexed by physical addresses (PAs) Space of all physical addresses called physical memory (PM) Divided into pages indexed by physical page number (PPN)

Review: Translation Look-Aside Buffers (TLBs) TLBs usually small, typically 128 - 256 entries Like any other cache, the TLB can be direct mapped, set associative, or fully associative hit VA PA TLB Lookup Cache Main Memory Processor miss miss hit data Trans- lation On TLB miss, get page table entry from main memory

Review: Memory Hierarchy Regs Upper Level Instr. Operands Faster Cache Blocks L2 Cache Blocks { Last Week: Virtual Memory Memory Pages Disk Files Larger Tape Lower Level

Review Example 1 A set-associative cache consists of 64 lines, or slots, divided into four-line sets. Main memory contains 4K blocks of 128 words each. Show the format of main memory addresses.

Solution The cache is divided into 16 sets of 4 lines each. Therefore, 4 bits are needed to identify the set number. Main memory consists of 4K = 212 blocks. Therefore, the set plus tag lengths must be 12 bits and therefore the tag length is 8 bits. Each block contains 128 words. Therefore, 7 bits are needed to specify the word.

Review Example 2 A two-way set-associative cache has lines of 16 bytes and a total size of 8 kbytes. The 64-Mbyte main memory is byte addressable. Show the format of main memory addresses.

Solution There are a total of 8 kbytes/16 bytes = 512 lines in the cache. Thus the cache consists of 256 sets of 2 lines each. Therefore 8 bits are needed to identify the set number. For the 64-Mbyte main memory, a 26-bit address is needed. Main memory consists of 64-Mbyte/16 bytes = 222 blocks. Therefore, the set plus tag lengths must be 22 bits, so the tag length is 14 bits and the word field length is 4 bits.

Agenda Stages of the Datapath Datapath Instruction Walkthroughs Datapath Design Dr Dan Garcia

Five Components of a Computer Keyboard, Mouse Computer Devices Memory (passive) (where programs, data live when running) Processor Disk (where programs, data live when not running) Input Control Output Datapath Display, Printer Dr Dan Garcia

The CPU Processor (CPU): the active part of the computer that does all the work (data manipulation and decision-making) Datapath: portion of the processor that contains hardware necessary to perform operations required by the processor (the brawn) Control: portion of the processor (also in hardware) that tells the datapath what needs to be done (the brain) Dr Dan Garcia

Stages of the Datapath : Overview Problem: a single, atomic block that “executes an instruction” (performs all necessary operations beginning with fetching the instruction) would be too bulky and inefficient Solution: break up the process of “executing an instruction” into stages, and then connect the stages to create the whole datapath smaller stages are easier to design easy to optimize (change) one stage without touching the others Dr Dan Garcia

Five Stages of the Datapath Stage 1: Instruction Fetch Stage 2: Instruction Decode Stage 3: ALU (Arithmetic-Logic Unit) Stage 4: Memory Access Stage 5: Register Write Dr Dan Garcia

Stages of the Datapath (1/5) There is a wide variety of MIPS instructions: so what general steps do they have in common? Stage 1: Instruction Fetch no matter what the instruction, the 32-bit instruction word must first be fetched from memory (the cache-memory hierarchy) also, this is where we Increment PC (that is, PC = PC + 4, to point to the next instruction: byte addressing so + 4) Dr Dan Garcia

Stages of the Datapath (2/5) Stage 2: Instruction Decode upon fetching the instruction, we next gather data from the fields (decode all necessary instruction data) first, read the opcode to determine instruction type and field lengths second, read in data from all necessary registers for add, read two registers for addi, read one register for jal, no reads necessary Dr Dan Garcia

Stages of the Datapath (3/5) Stage 3: ALU (Arithmetic-Logic Unit) the real work of most instructions is done here: arithmetic (+, -, *, /), shifting, logic (&, |), comparisons (slt) what about loads and stores? lw $t0, 40($t1) the address we are accessing in memory = the value in $t1 PLUS the value 40 so we do this addition in this stage Dr Dan Garcia

Stages of the Datapath (4/5) Stage 4: Memory Access actually only the load and store instructions do anything during this stage; the others remain idle during this stage or skip it all together since these instructions have a unique step, we need this extra stage to account for them as a result of the cache system, this stage is expected to be fast Dr Dan Garcia

Stages of the Datapath (5/5) Stage 5: Register Write most instructions write the result of some computation into a register examples: arithmetic, logical, shifts, loads, slt what about stores, branches, jumps? don’t write anything into a register at the end these remain idle during this fifth stage or skip it all together Dr Dan Garcia

Morgan Kaufmann Publishers 19 September, 2018 Logic Design Basics Information encoded in binary Low voltage = 0, High voltage = 1 One wire per bit Multi-bit data encoded on multi-wire buses Combinational element Operate on data Output is a function of input State (sequential) elements Store information §4.2 Logic Design Conventions Chapter 4 — The Processor — 20 Chapter 4 — The Processor

Combinational Elements Morgan Kaufmann Publishers 19 September, 2018 Combinational Elements AND-gate Y = A & B Adder Y = A + B A B Y + A B Y Arithmetic/Logic Unit Y = F(A, B) Multiplexer Y = S ? I1 : I0 A B Y ALU F I0 I1 Y M u x S Chapter 4 — The Processor — 21 Chapter 4 — The Processor

Morgan Kaufmann Publishers 19 September, 2018 ALU Control ALU used for Load/Store: F = add Branch: F = subtract R-type: F depends on funct field §4.4 A Simple Implementation Scheme ALU control Function 0000 AND 0001 OR 0010 add 0110 subtract 0111 set-on-less-than 1100 NOR Chapter 4 — The Processor — 22 Chapter 4 — The Processor

Morgan Kaufmann Publishers 19 September, 2018 Sequential Elements Register: stores data in a circuit Uses a clock signal to determine when to update the stored value Edge-triggered: update when Clk changes from 0 to 1 Clk D Q D Clk Q Chapter 4 — The Processor — 23 Chapter 4 — The Processor

Morgan Kaufmann Publishers 19 September, 2018 Sequential Elements Register with write control Only updates on clock edge when write control input is 1 Used when stored value is required later Write D Q Clk D Clk Q Write Chapter 4 — The Processor — 24 Chapter 4 — The Processor

Morgan Kaufmann Publishers 19 September, 2018 Clocking Methodology Combinational logic transforms data during clock cycles Between clock edges Input from state elements, output to state element Longest delay determines clock period Chapter 4 — The Processor — 25 Chapter 4 — The Processor

Morgan Kaufmann Publishers 19 September, 2018 Building a Datapath Datapath Elements that process data and addresses in the CPU Registers, ALUs, mux’s, memories, … We will build a MIPS datapath incrementally Refining the overview design §4.3 Building a Datapath Chapter 4 — The Processor — 26 Chapter 4 — The Processor

Morgan Kaufmann Publishers 19 September, 2018 Instruction Fetch Increment by 4 for next instruction 32-bit register Chapter 4 — The Processor — 27 Chapter 4 — The Processor

R-Format Instructions Morgan Kaufmann Publishers 19 September, 2018 R-Format Instructions Read two register operands Perform arithmetic/logical operation Write register result Chapter 4 — The Processor — 28 Chapter 4 — The Processor

Load/Store Instructions Morgan Kaufmann Publishers 19 September, 2018 Load/Store Instructions Read register operands Calculate address using 16-bit offset Use ALU, but sign-extend offset Load: Read memory and update register Store: Write register value to memory Chapter 4 — The Processor — 29 Chapter 4 — The Processor

Morgan Kaufmann Publishers 19 September, 2018 Branch Instructions Read register operands Compare operands Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4 Already calculated by instruction fetch Chapter 4 — The Processor — 30 Chapter 4 — The Processor

Morgan Kaufmann Publishers 19 September, 2018 Branch Instructions Just re-routes wires Sign-bit wire replicated Chapter 4 — The Processor — 31 Chapter 4 — The Processor

Composing the Elements Morgan Kaufmann Publishers 19 September, 2018 Composing the Elements First-cut data path does an instruction in one clock cycle Each datapath element can only do one function at a time Hence, we need separate instruction and data memories Use multiplexers where alternate data sources are used for different instructions Chapter 4 — The Processor — 32 Chapter 4 — The Processor

R-Type/Load/Store Datapath Morgan Kaufmann Publishers 19 September, 2018 R-Type/Load/Store Datapath Chapter 4 — The Processor — 33 Chapter 4 — The Processor

Morgan Kaufmann Publishers 19 September, 2018 Full Datapath Chapter 4 — The Processor — 34 Chapter 4 — The Processor

Generic Steps of Datapath rd ALU instruction memory PC registers rs memory Data rt +4 imm missing: multiplexors or “data selectors” – where should they be in this picture and why? also missing – opcode for control of what operations to perform state elements vs combinational ones – combinational given the same input will always produce the same output – out depends only on the current input 2. Decode/ Register Read 1. Instruction Fetch 3. Execute 4. Memory 5. Register Write Dr Dan Garcia

Datapath Walkthroughs (1/3) add $r3,$r1,$r2 # r3 = r1+r2 Dr Dan Garcia

Datapath Walkthroughs (1/3) add $r3,$r1,$r2 # r3 = r1+r2 Stage 1: fetch this instruction, increment PC Stage 2: decode to determine it is an add, then read registers $r1 and $r2 Stage 3: add the two values retrieved in Stage 2 Stage 4: idle (nothing to write to memory) Stage 5: write result of Stage 3 into register $r3 9/19/2018 Dr Dan Garcia

Example: add Instruction reg[1]+ reg[2] reg[2] reg[1] 2 1 3 add r3, r1, r2 ALU instruction memory PC registers memory Data imm +4 Dr Dan Garcia

Datapath Walkthroughs (2/3) slti $r3,$r1,17 # if (r1 <17 )r3 = 1 else r3 = 0 Dr Dan Garcia

Datapath Walkthroughs (2/3) slti $r3,$r1,17 # if (r1 <17 )r3 = 1 else r3 = 0 Stage 1: fetch this instruction, increment PC Stage 2: decode to determine it is an slti, then read register $r1 Stage 3: compare value retrieved in Stage 2 with the integer 17 Stage 4: idle Stage 5: write the result of Stage 3 (1 if reg source was less than signed immediate, 0 otherwise) into register $r3 9/19/2018 Dr Dan Garcia

Example: slti Instruction reg[1] <17? 17 reg[1] 3 1 x slti r3, r1, 17 ALU instruction memory PC registers memory Data imm +4 Dr Dan Garcia

Datapath Walkthroughs (3/3) sw $r3,17($r1) # Mem[r1+17]=r3 Dr Dan Garcia

Datapath Walkthroughs (3/3) sw $r3,17($r1) # Mem[r1+17]=r3 Stage 1: fetch this instruction, increment PC Stage 2: decode to determine it is a sw, then read registers $r1 and $r3 Stage 3: add 17 to value in register $r1 (retrieved in Stage 2) to compute address Stage 4: write the value contained in register $r3 (retrieved in Stage 2) into memory address computed in Stage 3 Stage 5: idle (nothing to write into a register) Dr Dan Garcia

Example: sw Instruction reg[1] +17 17 reg[1] 3 1 x SW r3, 17(r1) ALU instruction memory PC registers memory Data MEM[r1+17]<=r3 reg[3] imm +4 Dr Dan Garcia

Why Five Stages? (1/2) Could we have a different number of stages? Yes, and other architectures do So why does MIPS have five if instructions tend to idle for at least one stage? Five stages are the union of all the operations needed by all the instructions. One instruction uses all five stages: the load Dr Dan Garcia

Why Five Stages? (2/2) lw $r3,17($r1) # r3=Mem[r1+17] Stage 1: fetch this instruction, increment PC Stage 2: decode to determine it is a lw, then read register $r1 Stage 3: add 17 to value in register $r1 (retrieved in Stage 2) Stage 4: read value from memory address computed in Stage 3 Stage 5: write value read in Stage 4 into register $r3 Dr Dan Garcia

Example: lw Instruction reg[1] +17 17 reg[1] 3 1 x LW r3, 17(r1) ALU instruction memory PC registers memory Data MEM[r1+17] imm +4 Dr Dan Garcia

Peer Instruction How many places in this diagram will need a multiplexor to select one from multiple inputs? a) 0 b) 1 c) 2 d) 3 e) 4 or more Dr Dan Garcia

Morgan Kaufmann Publishers 19 September, 2018 Peer Instruction Can’t just join wires together Use multiplexers Dr Dan Garcia Chapter 4 — The Processor

Datapath and Control Controller Datapath based on data transfers required to perform instructions Controller causes the right transfers to happen PC instruction memory +4 rt rs rd registers Data imm ALU Controller opcode, funct Dr Dan Garcia

What Hardware Is Needed? (1/2) PC: a register that keeps track of address of the next instruction to be fetched General Purpose Registers Used in Stages 2 (Read) and 5 (Write) MIPS has 32 of these Memory Used in Stages 1 (Fetch) and 4 (R/W) Caches makes these stages as fast as the others (on average, otherwise multicycle stall) Dr Dan Garcia

What Hardware Is Needed? (2/2) ALU Used in Stage 3 Performs all necessary functions: arithmetic, logicals, etc. Miscellaneous Registers One stage per clock cycle: Registers inserted between stages to hold intermediate data and control signals as they travel from stage to stage Note: Register is a general purpose term meaning something that stores bits. Realize that not all registers are in the “register file” Dr Dan Garcia

CPU Clocking (1/2) For each instruction, how do we control the flow of information though the datapath? Single Cycle CPU: All stages of an instruction completed within one long clock cycle Clock cycle sufficiently long to allow each instruction to complete all stages without interruption within one cycle 2. Decode/ Register Read 1. Instruction Fetch 3. Execute 4. Memory 5. Reg. Write Dr Dan Garcia

CPU Clocking (2/2) Alternative multiple-cycle CPU: only one stage of instruction per clock cycle Clock is made as long as the slowest stage Several significant advantages over single cycle execution: Unused stages in a particular instruction can be skipped OR instructions can be pipelined (overlapped) 2. Decode/ Register Read 1. Instruction Fetch 3. Execute 4. Memory 5. Register Write Dr Dan Garcia

Processor Design Analyze instruction set architecture (ISA) to determine datapath requirements Meaning of each instruction is given by register transfers Datapath must include storage element for ISA registers Datapath must support each register transfer Select set of datapath components and establish clocking methodology Assemble datapath components to meet requirements Analyze each instruction to determine sequence of control point settings to implement the register transfer Assemble the control logic to perform this sequencing Dr Dan Garcia

Instruction Level Parallelism IF ID ALU MEM WR Instr 1 IF ID ALU MEM WR Instr 2 Instr 2 IF ID ALU MEM WR Instr 3 IF ID ALU MEM WR IF ID ALU MEM WR Instr 4 IF ID ALU MEM WR Instr 5 IF ID ALU MEM WR Instr 6 IF ID ALU MEM WR Instr 7 IF ID ALU MEM WR Instr 8 Dr Dan Garcia

Summary CPU design involves Datapath, Control 5 Stages for MIPS Instructions Instruction Fetch Instruction Decode & Register Read ALU (Execute) Memory Register Write Datapath timing: single long clock cycle or one short clock cycle per stage Dr Dan Garcia