331 Lec18.1Fall 2003 14:332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath [Adapted from Dave.

Slides:

Advertisements

Similar presentations

Microprocessor Design Multi-cycle Datapath Nia S. Bradley Vijay.

Advertisements

CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath

The Processor: Datapath & Control

Spring W :332:331 Computer Architecture and Assembly Language Spring 2006 Week 12 Introduction to Pipelined Datapath [Adapted from Dave Patterson’s.

Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.

Datorsystem 1 och Datorarkitektur 1 – föreläsning 10 måndag 19 November 2007.

Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

CS 152 L10 Pipeline Intro (1)Fall 2004 © UC Regents CS152 – Computer Architecture and Engineering Fall 2004 Lecture 10: Basic MIPS Pipelining Review John.

Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

331 W10.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 10 Building a Multi-Cycle Datapath [Adapted from Dave Patterson’s.

Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.

CS 300 – Lecture 24 Intro to Computer Architecture / Assembly Language The LAST Lecture!

Spring W :332:331 Computer Architecture and Assembly Language Spring 2005 Week 11 Introduction to Pipelined Datapath [Adapted from Dave Patterson’s.

CSE331 W11&12.1Irwin&Li Fall 2006 PSU CSE 331 Computer Organization and Design Fall 2006 Week 11&12 (Thanksgiving Week) Section 1: Mary Jane Irwin (

CSE431 L05 Basic MIPS Architecture.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 05: Basic MIPS Architecture Review Mary Jane Irwin.

Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

CSE331 W10&11.1Irwin Fall 2007 PSU CSE 331 Computer Organization and Design Fall 2007 Week 10 & 11 Section 1: Mary Jane Irwin (

Computer Architecture Chapter 5 Fall 2005 Department of Computer Science Kent State University.

Analogy: Gotta Do Laundry

CSE431 Chapter 4A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 4A: The Processor, Part A Mary Jane Irwin ( )

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

Basic Pipelining & MIPS Pipelining Chapter 6 [Computer Organization and Design, © 2007 Patterson (UCB) & Hennessy (Stanford), & Slides Adapted from: Mary.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

Computer Organization CS224 Chapter 4 Part b The Processor Spring 2010 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture.

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

1 CS/COE0447 Computer Organization & Assembly Language Multi-Cycle Execution.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]

ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.

1 Processor: Datapath and Control Single cycle processor –Datapath and Control Multicycle processor –Datapath and Control Microprogramming –Vertical and.

CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]

1 Processor: Datapath and Control Single cycle processor –Datapath and Control Multicycle processor –Datapath and Control Microprogramming –Vertical and.

Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and

CSE331 W10.1Irwin&Li Fall 2006 PSU CSE 331 Computer Organization and Design Fall 2006 Week 10 Section 1: Mary Jane Irwin (

ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.

CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-1 Read Sections 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State.

EI209 Chapter 4B.1Haojin Zhu, SJTU 2015 EI 209 Computer Organization Fall 2015 Chapter 4B: The Processor, Control and Multi-cycle Datapath [Adapted from.

Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 3.

Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:

CSE 331 Computer Organization and Design Fall 2007 Week 10 & 11

Multi-Cycle Datapath and Control

Computer Organization

IT 251 Computer Organization and Architecture

Systems Architecture I

Single Cycle Processor

ECE232: Hardware Organization and Design

Basic MIPS Architecture

Chapter 4 The Processor Part 2

The Multicycle Implementation

CS-447– Computer Architecture Lecture 14 Pipelining (2)

CSE 331 Computer Organization and Design Fall 2007 Week 12

The Multicycle Implementation

The Processor Lecture 3.4: Pipelining Datapath and Control

Systems Architecture I

Vishwani D. Agrawal James J. Danaher Professor

COSC 2021: Computer Organization Instructor: Dr. Amir Asif

Processor: Multi-Cycle Datapath & Control

Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Systems Architecture I

CS161 – Design and Architecture of Computer Systems

Presentation transcript:

331 Lec18.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane Irwin’s PSU CSE331 slides]

331 Lec18.2Fall 2003 Head’s Up  This week’s material l Introduction to pipelining -Reading assignment – PH 6.1  Reminders l HW#6 deadline??? l Next week’s material l I/O, exceptions, and interrupts -Reading assignment – PH 5.6, 8.5, and A.7 through A.8

331 Lec18.3Fall 2003 Review: Multicycle Data and Control Path Address Read Data (Instr. or Data) Memory PC Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU Write Data IR MDR A B ALUout Sign Extend Shift left 2 ALU control Shift left 2 ALUOp Control FSM IRWrite MemtoReg MemWrite MemRead IorD PCWrite PCWriteCond RegDst RegWrite ALUSrcA ALUSrcB zero PCSource Instr[5-0] Instr[25-0] PC[31-28] Instr[15-0] Instr[31-26] 32 28

331 Lec18.4Fall 2003 Review: RTL Summary StepR-typeMem RefBranchJump Instr fetch IR = Memory[PC]; PC = PC + 4; Decode A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC +(sign-extend(IR[15-0])<< 2); Execute ALUOut = A op B; ALUOut = A + sign-extend (IR[15-0]); if (A==B) PC = ALUOut; PC = PC[31-28] ||(IR[25-0] << 2); Memory access Reg[IR[15- 11]] = ALUOut; MDR = Memory[ALUOut]; or Memory[ALUOut] = B; Write- back Reg[IR[20-16]] = MDR;

331 Lec18.5Fall 2003 Review: Multicycle Datapath FSM Start Instr Fetch Decode Write Back Memory Access Execute (Op = R-type) (Op = beq) (Op = lw or sw) (Op = j) (Op = lw) (Op = sw) Unless otherwise assigned PCWrite,IRWrite, MemWrite,RegWrite=0 others=X IorD=0 MemRead;IRWrite ALUSrcA=0 ALUsrcB=01 PCSource,ALUOp=00 PCWrite ALUSrcA=0 ALUSrcB=11 ALUOp=00 PCWriteCond=0 ALUSrcA=1 ALUSrcB=10 ALUOp=00 PCWriteCond=0 ALUSrcA=1 ALUSrcB=00 ALUOp=10 PCWriteCond=0 ALUSrcA=1 ALUSrcB=00 ALUOp=01 PCSource=01 PCWriteCond PCSource=10 PCWrite MemRead IorD=1 PCWriteCond=0 MemWrite IorD=1 PCWriteCond=0 RegDst=1 RegWrite MemtoReg=0 PCWriteCond=0 RegDst=0 RegWrite MemtoReg=1 PCWriteCond=0

331 Lec18.6Fall 2003 Review: FSM Implementation Combinational control logic State Reg Inst[31-26] Next State Inputs Outputs Op0Op1Op2Op3Op4Op5 PCWrite PCWriteCond IorD MemRead MemWrite IRWrite MemtoReg PCSource ALUOp ALUSourceB ALUSourceA RegWrite RegDst System Clock

331 Lec18.7Fall 2003 Single Cycle Disadvantages & Advantages  Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction  Is wasteful of area since some functional units must (e.g., adders) be duplicated since they can not be shared during a clock cycle but  Is simple and easy to understand Clk Single Cycle Implementation: lwswWaste Cycle 1Cycle 2

331 Lec18.8Fall 2003 Multicycle Advantages & Disadvantages  Uses the clock cycle efficiently – the clock cycle is timed to accommodate the slowest instruction step l balance the amount of work to be done in each step l restrict each step to use only one major functional unit  Multicycle implementations allow l functional units to be used more than once per instruction as long as they are used on different clock cycles l faster clock rates l different instructions to take a different number of clock cycles but  Requires additional internal state registers, muxes, and more complicated (FSM) control

331 Lec18.9Fall 2003 The Five Stages of Load Instruction  IFetch: Instruction Fetch and Update PC  Dec: Registers Fetch and Instruction Decode  Exec: Execute R-type; calculate memory address  Mem: Read/write the data from/to the Data Memory  WB: Write the data back to the register file Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IFetchDecExecMemWB lw

331 Lec18.10Fall 2003 Single Cycle vs. Multiple Cycle Timing Clk Cycle 1 Multiple Cycle Implementation: IFetchDecExecMemWB Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 IFetchDecExecMem lwsw Clk Single Cycle Implementation: lwswWaste IFetch R-type Cycle 1Cycle 2 multicycle clock slower than 1/5 th of single cycle clock due to stage flipflop overhead

331 Lec18.11Fall 2003 Pipelined MIPS Processor  Start the next instruction while still working on the current one l improves throughput - total amount of work done in a given time l instruction latency (execution time, delay time, response time) is not reduced - time from the start of an instruction to its completion Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IFetchDecExecMemWB lw Cycle 7Cycle 6Cycle 8 sw IFetchDecExecMemWB R-type IFetchDecExecMemWB

331 Lec18.12Fall 2003 Single Cycle, Multiple Cycle, vs. Pipeline Clk Cycle 1 Multiple Cycle Implementation: IFetchDecExecMemWB Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 lw IFetchDecExecMemWB IFetchDecExecMem lwsw Pipeline Implementation: IFetchDecExecMemWB sw Clk Single Cycle Implementation: LoadStoreWaste IFetch R-type IFetchDecExecMemWB R-type Cycle 1Cycle 2 wasted cycle

331 Lec18.13Fall 2003 Pipelining the MIPS ISA  What makes it easy l all instructions are the same length (32 bits) l few instruction formats (three) with symmetry across formats l memory operations can occur only in loads and stores l operands must be aligned in memory so a single data transfer requires only one memory access  What makes it hard l structural hazards: what if we had only one memory l control hazards: what about branches l data hazards: what if an instruction’s input operands depend on the output of a previous instruction

331 Lec18.14Fall 2003 MIPS Pipeline Datapath Modifications Read Address Instruction Memory Add PC Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data ALU 1 0 Shift left 2 Add Data Memory Address Write Data Read Data 1 0  What do we need to add/modify in our MIPS datapath? l State registers between pipeline stages to isolate them IFetch/Dec Dec/Exec Exec/Mem Mem/WB IFetch DecExecMemWB System Clock Sign Extend

331 Lec18.15Fall 2003 MIPS Pipeline Control Path Modifications Read Address Instruction Memory Add PC Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data ALU 1 0 Shift left 2 Add Data Memory Address Write Data Read Data 1 0  All control signals are determined during Decode l and held in the state registers between pipeline stages IFetch/Dec Dec/Exec Exec/Mem Mem/WB IFetchDecExecMemWB System Clock Control Sign Extend

331 Lec18.16Fall 2003 Graphically Representing MIPS Pipeline  Can help with answering questions like: l how many cycles does it take to execute this code? l what is the ALU doing during cycle 4? l is there a hazard, why does it occur, and how can it be fixed? ALU IM Reg DMReg

331 Lec18.17Fall 2003 Why Pipeline? For Throughput! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg Once the pipeline is full, one instruction is completed every cycle Time to fill the pipeline

331 Lec18.18Fall 2003 Can pipelining get us into trouble?  Yes: Pipeline Hazards l structural hazards: attempt to use the same resource by two different instructions at the same time l data hazards: attempt to use item before it is ready -instruction depends on result of prior instruction still in the pipeline l control hazards: attempt to make a decision before condition is evaulated -branch instructions  Can always resolve hazards by waiting l pipeline control must detect the hazard l take action (or delay action) to resolve hazards

331 Lec18.19Fall 2003 I n s t r. O r d e r Time (clock cycles) lw Inst 1 Inst 2 Inst 4 Inst 3 ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg A Unified Memory Would Be a Structural Hazard Reading data from memory Reading instruction from memory

331 Lec18.20Fall 2003 How About Register File Access? I n s t r. O r d e r Time (clock cycles) add Inst 1 Inst 2 Inst 4 add ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg Can fix register file access hazard by doing reads in the second half of the cycle and writes in the first half.

331 Lec18.21Fall 2003 Branch Instructions Cause Control Hazards I n s t r. O r d e r add beq lw Inst 4 Inst 3 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg  Dependencies backward in time cause hazards

331 Lec18.22Fall 2003 One Way to “Fix” a Control Hazard I n s t r. O r d e r add beq ALU IM Reg DMReg ALU IM Reg DMReg Inst 3 lw ALU IM Reg DMReg ALU IM Reg DMReg Can fix branch hazard by waiting – stall – but affects throughput stall

331 Lec18.23Fall 2003 Register Usage Can Cause Data Hazards I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r5 and r6,r1,r7 xor r4,r1,r5 or r8, r1, r9 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg  Dependencies backward in time cause hazards

331 Lec18.24Fall 2003 One Way to “Fix” a Data Hazard I n s t r. O r d e r add r1,r2,r3 ALU IM Reg DMReg sub r4,r1,r5 and r6,r1,r7 ALU IM Reg DMReg ALU IM Reg DMReg stall Can fix data hazard by waiting – stall – but affects throughput

331 Lec18.25Fall 2003 Loads Can Cause Data Hazards I n s t r. O r d e r lw r1,100(r2) sub r4,r1,r5 and r6,r1,r7 xor r4,r1,r5 or r8, r1, r9 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg  Dependencies backward in time cause hazards

331 Lec18.26Fall 2003 Stores Can Cause Data Hazards I n s t r. O r d e r add r1,r2,r3 sw r1,100(r5) and r6,r1,r7 xor r4,r1,r5 or r8, r1, r9 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg  Dependencies backward in time cause hazards

331 Lec18.27Fall 2003 Other Pipeline Structures Are Possible  What about (slow) multiply operation? l let it take two cycles  What if the data memory access is twice as slow as the instruction memory? l make the clock twice as slow or … l let data memory access take two cycles (and keep the same clock rate) ALU IM Reg DMReg MUL ALU IM Reg DM1Reg DM2

331 Lec18.28Fall 2003 Sample Pipeline Alternatives  ARM7  StrongARM-1  XScale ALU IM1 IM2 DM1 Reg DM2 IM Reg EX PC update IM access decode reg access ALU op DM access shift/rotate commit result (write back) ALU IM Reg DMReg SHFT PC update BTB access start IM access IM access decode reg 1 access shift/rotate reg 2 access ALU op start DM access exception DM write reg write

331 Lec18.29Fall 2003 Summary  All modern day processors use pipelining  Pipelining doesn’t help latency of single task, it helps throughput of entire workload l Multiple tasks operating simultaneously using different resources  Potential speedup = Number of pipe stages  Pipeline rate limited by slowest pipeline stage l Unbalanced lengths of pipe stages reduces speedup l Time to “fill” pipeline and time to “drain” it reduces speedup  Must detect and resolve hazards l Stalling negatively affects throughput  To learn (much) more take CSE 431