Computer Organization

Slides:



Advertisements
Similar presentations
PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,
Advertisements

Morgan Kaufmann Publishers The Processor
Pipeline Exceptions & ControlCSCE430/830 Pipelining in MIPS MIPS architecture was designed to be pipelined –Simple instruction format (makes IF, ID easy)
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
Goal: Describe Pipelining
Chapter Six 1.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1 Recap (Pipelining). 2 What is Pipelining? A way of speeding up execution of tasks Key idea : overlap execution of multiple taks.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 18 - Pipelined.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
Pipeline Exceptions & ControlCSCE430/830 Pipeline: Exceptions & Control CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
1 Atanasoff–Berry Computer, built by Professor John Vincent Atanasoff and grad student Clifford Berry in the basement of the physics building at Iowa State.
Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
CS1104: Computer Organisation School of Computing National University of Singapore.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Multi-Cycle Processor.
Computer Science Education
Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
11/13/2015 8:57 AM 1 of 86 Pipelining Chapter 6. 11/13/2015 8:57 AM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
Analogy: Gotta Do Laundry
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.
ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
Pipelining Example Laundry Example: Three Stages
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
Pipelining CS365 Lecture 9. D. Barbara Pipeline CS465 2 Outline  Today’s topic  Pipelining is an implementation technique in which multiple instructions.
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Lecture 5. MIPS Processor Design Pipelined MIPS #1 Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212 Computer Architecture.
Lecture 18: Pipelining I.
Pipelines An overview of pipelining
Pipelining concepts, datapath and hazards
Morgan Kaufmann Publishers
Performance of Single-cycle Design
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Morgan Kaufmann Publishers The Processor
Chapter 4 The Processor Part 2
Morgan Kaufmann Publishers Enhancing Performance with Pipelining
Lecturer: Alan Christopher
Systems Architecture II
CS203 – Advanced Computer Architecture
Morgan Kaufmann Publishers The Processor
Introduction to Computer Organization and Architecture
Guest Lecturer: Justin Hsia
A relevant question Assuming you’ve got: One washer (takes 30 minutes)
Presentation transcript:

Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin Lectures 1 & 2

Roadmap for the Term: Major Topics Overview / Abstractions and Technology Performance Instruction sets Logic & arithmetic Processor Implementation Single-cycle implemenatation Multicycle implementation Pipelined Implementation \ Memory systems Input/Output H.Y. Lin, CCUEE Computer Organization 4

Computer Organization Pipelining Outline Introduction Defining Pipelining \ Pipelining Instructions Hazards Pipelined Processor Design Datapath Control Advanced Pipelining Superscalar Dynamic Pipelining Examples H.Y. Lin, CCUEE Computer Organization

Computer Organization What is Pipelining? A way of speeding up execution of instructions Key idea: overlap execution of multiple instructions Analogy: doing your laundry 1. Run load through washer 2. Run load through dryer 3. Fold clothes 4. Put away clothes 5. Go to 1 Observation: we can start another load as soon as we finish step 1! H.Y. Lin, CCUEE Computer Organization

Computer Organization The Laundry Analogy Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 30 minutes “Folder” takes 30 minutes “Stasher” takes 30 minutes to put clothes into drawers A B C D H.Y. Lin, CCUEE Computer Organization

If we do laundry sequentially... 6 PM 7 8 9 10 11 12 1 2 AM 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 T a s k O r d e Time A B C D Time Required: 8 hours for 4 loads H.Y. Lin, CCUEE Computer Organization

To Pipeline, We Overlap Tasks 6 PM 7 8 9 10 11 12 1 2 AM Time 30 30 30 30 30 30 30 T a s k O r d e B A D C Time Required: 3.5 Hours for 4 Loads Latency remains 2 hours Throughput improves by factor of 2.3 (decreases for more loads) H.Y. Lin, CCUEE Computer Organization

Pipelining a Digital System Key idea: break big computation up into pieces Separate each piece with a pipeline register 1ns 200ps Pipeline Register H.Y. Lin, CCUEE Computer Organization

Pipelining a Digital System Why do this? Because it's faster for repeated computations 1ns Non-pipelined: 1 operation finishes every 1ns 200ps Pipelined: 1 operation finishes every 200ps H.Y. Lin, CCUEE Computer Organization

Comments about pipelining Pipelining increases throughput, but not latency Answer available every 200ps, BUT A single computation still takes 1ns Limitations: Computations must be divisible into stage size Pipeline registers add overhead H.Y. Lin, CCUEE Computer Organization

Pipelining a Processor Recall the 5 steps in instruction execution: 1. Instruction Fetch 2. Instruction Decode and Register Read 3. Execution operation or calculate address 4. Memory access 5. Write result into register Review: Single-Cycle Processor All 5 steps done in a single clock cycle Dedicated hardware required for each step What happens if we break execution into multiple cycles, but keep the extra hardware? H.Y. Lin, CCUEE Computer Organization

Review - Single-Cycle Processor IF Instruction Fetch ID Instruction Decode EX Execute/ Address Calc. MEM Memory Access WB Write Back H.Y. Lin, CCUEE Computer Organization

Computer Organization Pipelining - Key Idea Question: What happens if we break execution into multiple cycles, but keep the extra hardware? Answer: In the best case, we can start executing a new instruction on each clock cycle – this is pipelining Pipelining stages: IF - Instruction Fetch ID - Instruction Decode EX - Execute / Address Calculation MEM - Memory Access (read / write) WB - Write Back (results into register file) H.Y. Lin, CCUEE Computer Organization

Basic Pipelined Processor Pipeline Registers IF/ID ID/EX EX/MEM MEM/WB H.Y. Lin, CCUEE Computer Organization

Single-Cycle vs. Pipelined Execution Non-Pipelined Pipelined H.Y. Lin, CCUEE Computer Organization

Comments about Pipelining The good news Multiple instructions are being processed at same time This works because stages are isolated by registers Best case speedup of N The bad news Instructions interfere with each other – hazards Example: different instructions may need the same piece of hardware (e.g., memory) in same clock cycle Example: instruction may require a result produced by an earlier instruction that is not yet complete Worst case: must suspend execution – stall H.Y. Lin, CCUEE Computer Organization

Pipelined Example – Executing Multiple Instructions Consider the following instruction sequence: lw $r0, 10($r1) sw $sr3, 20($r4) add $r5, $r6, $r7 sub $r8, $r9, $r10 H.Y. Lin, CCUEE Computer Organization

Executing Multiple Instructions Clock Cycle 1 lw $r0, 10($r1) sw $sr3, 20($r4) add $r5, $r6, $r7 sub $r8, $r9, $r10 LW H.Y. Lin, CCUEE Computer Organization

Executing Multiple Instructions Clock Cycle 2 lw $r0, 10($r1) sw $sr3, 20($r4) add $r5, $r6, $r7 sub $r8, $r9, $r10 SW LW H.Y. Lin, CCUEE Computer Organization

Executing Multiple Instructions Clock Cycle 3 lw $r0, 10($r1) sw $sr3, 20($r4) add $r5, $r6, $r7 sub $r8, $r9, $r10 ADD SW LW H.Y. Lin, CCUEE Computer Organization

Executing Multiple Instructions Clock Cycle 4 lw $r0, 10($r1) sw $sr3, 20($r4) add $r5, $r6, $r7 sub $r8, $r9, $r10 SUB ADD SW LW H.Y. Lin, CCUEE Computer Organization

Executing Multiple Instructions Clock Cycle 5 lw $r0, 10($r1) sw $sr3, 20($r4) add $r5, $r6, $r7 sub $r8, $r9, $r10 SUB ADD SW LW H.Y. Lin, CCUEE Computer Organization

Executing Multiple Instructions Clock Cycle 6 lw $r0, 10($r1) sw $sr3, 20($r4) add $r5, $r6, $r7 sub $r8, $r9, $r10 SUB ADD SW H.Y. Lin, CCUEE Computer Organization

Executing Multiple Instructions Clock Cycle 7 lw $r0, 10($r1) sw $sr3, 20($r4) add $r5, $r6, $r7 sub $r8, $r9, $r10 SUB ADD H.Y. Lin, CCUEE Computer Organization

Executing Multiple Instructions Clock Cycle 8 lw $r0, 10($r1) sw $sr3, 20($r4) add $r5, $r6, $r7 sub $r8, $r9, $r10 SUB H.Y. Lin, CCUEE Computer Organization

Alternative View – Multicycle Diagram H.Y. Lin, CCUEE Computer Organization

Computer Organization Pipeline Hazards Where one instruction cannot immediately follow another Types of hazards Structural hazards – attempt to use same resource twice Control hazards – attempt to make decision before condition is evaluated Data hazards – attempt to use data before it is ready Can always resolve hazards by waiting H.Y. Lin, CCUEE Computer Organization

Computer Organization Structural Hazards Attempt to use same resource twice at same time Example: Single Memory for instructions, data Accessed by IF stage Accessed at same time by MEM stage Solutions Delay second access by one clock cycle, OR Provide separate memories for instructions, data This is what the book does This is called a “Harvard Architecture” Real pipelined processors have separate caches H.Y. Lin, CCUEE Computer Organization

Example Structural Hazard – Single Memory Memory Conflict H.Y. Lin, CCUEE Computer Organization

Computer Organization Control Hazards Attempt to make a decision before condition is evaluated Example: beq $s0, $s1, offset Assume we add hardware to second stage to: Compare fetched registers for equality Compute branch target This allows branch to be taken at end of second clock cycle But, this still means result is not ready when we want to load the next instruction! H.Y. Lin, CCUEE Computer Organization

Control Hazard Solutions Stall – stop loading instructions until result is available Predict – assume an outcome and continue fetching (undo if prediction is wrong) Delayed branch – specify in architecture that following instruction is always executed H.Y. Lin, CCUEE Computer Organization

Computer Organization Control Hazard – Stall beq writes PC here new PC used here H.Y. Lin, CCUEE Computer Organization

Control Hazard – Correct Prediction Fetch assuming branch taken H.Y. Lin, CCUEE Computer Organization

Control Hazard – Incorrect Prediction “Squashed” instruction H.Y. Lin, CCUEE Computer Organization

Control Hazard – Delayed Branch always executes correct PC avail. here H.Y. Lin, CCUEE Computer Organization

Summary – Control Hazard Solutions Stall – stop fetching instr. until result is available Significant performance penalty Hardware required to stall Predict – assume an outcome and continue fetching (undo if prediction is wrong) Performance penalty only when guess wrong Hardware required to "squash" instructions Delayed branch – specify in architecture that following instruction is always executed Compiler re-orders instructions into delay slot Insert "NOP" (no-op) operations when can't use (~50%) This is how original MIPS worked H.Y. Lin, CCUEE Computer Organization

Computer Organization Data Hazards Attempt to use data before it is ready Solutions Stalling – wait until result is available Forwarding – make data available inside datapath Reordering instructions – use compiler to avoid hazards Examples: add $s0, $t0, $t1 ; $s0 = $t0+$t1 sub $t2, $s0, $t3 ; $t2 = $s0-$t2 lw $s0, 0($t0) ; $s0 = MEM[$t0] sub $t2, $s0, $t3 ; $t2 = $s0-$t2 H.Y. Lin, CCUEE Computer Organization

Computer Organization Data Hazard – Stalling H.Y. Lin, CCUEE Computer Organization

Data Hazards – Forwarding Key idea: connect new value directly to next stage Still read s0, but ignore in favor of new result Problem: what about load instructions? H.Y. Lin, CCUEE Computer Organization

Data Hazards – Forwarding STALL still required for load – data avail. after MEM MIPS architecture calls this delayed load, initial implementations required compiler to deal with this H.Y. Lin, CCUEE Computer Organization

Data Hazards – Reordering Instructions Assuming we have data forwarding, what are the hazards in this code? lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) Reorder instructions to remove hazard: lw $t0, 0($t1) lw $t2, 4($t1) sw $t0, 4($t1) sw $t2, 0($t1) H.Y. Lin, CCUEE Computer Organization

Summary - Pipelining Overview Pipelining increase throughput (but not latency) Hazards limit performance Structural hazards Control hazards Data hazards H.Y. Lin, CCUEE Computer Organization

Computer Organization Pipelining Outline Introduction Pipelined Processor Design \ Datapath Control Dealing with Hazards & Forwarding Branch Prediction Exceptions Performance Advanced Pipelining Superscalar Dynamic Pipelining Examples H.Y. Lin, CCUEE Computer Organization

Computer Organization Pipelining in MIPS MIPS architecture was designed to be pipelined Simple instruction format (makes IF, ID easy) Single-word instructions Small number of instruction formats Common fields in same place (e.g., rs, rt) in different formats Memory operations only in lw, sw instructions (simplifies EX) Memory operands aligned in memory (simplifies MEM) Single value for writeback (limits forwarding) Pipelining is harder in CISC architectures H.Y. Lin, CCUEE Computer Organization

Pipelined Datapath with Control Signals H.Y. Lin, CCUEE Computer Organization

Next Step: Adding Control Basic approach: build on single-cycle control Place control unit in ID stage Pass control signals to following stages Later: extra features to deal with: Data forwarding Stalls Exceptions H.Y. Lin, CCUEE Computer Organization

Control for Pipelined Datapath RegDst ALUOp[1:0] ALUSrc MemRead MemWrite Branch RegWrite MemtoReg Source: Book Fig. 6.29, p 469 H.Y. Lin, CCUEE Computer Organization

Control for Pipelined Datapath Source: Book Fig. 6.25, p 401 H.Y. Lin, CCUEE Computer Organization

Datapath and Control Unit H.Y. Lin, CCUEE Computer Organization

Tracking Control Signals - Cycle 1 LW H.Y. Lin, CCUEE Computer Organization

Tracking Control Signals - Cycle 2 SW LW H.Y. Lin, CCUEE Computer Organization

Tracking Control Signals - Cycle 3 01 1 ADD SW LW H.Y. Lin, CCUEE Computer Organization

Tracking Control Signals - Cycle 4 1 SUB ADD SW LW H.Y. Lin, CCUEE Computer Organization

Tracking Control Signals - Cycle 5 1 SUB ADD SW LW H.Y. Lin, CCUEE Computer Organization

Computer Organization References Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides – Fall 1997 © UCB Rob Rutenbar’s 18-347 Slides – Fall 1999 CMU John Nestor’s ECE 313 Slides – Fall 2004 LC T.S. Chang’s DEE 1050 Slides – Fall 2004 NCTU Other sources as noted H.Y. Lin, CCUEE Computer Organization