Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

Slides:



Advertisements
Similar presentations
PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,
Advertisements

Morgan Kaufmann Publishers The Processor
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
The Pipelined CPU Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Revised 9/22/2013.
CMPT 334 Computer Organization
Chapter 8. Pipelining.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
Goal: Describe Pipelining
Computer Architecture
CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.
Chapter Six 1.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
Computer Organization
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
Introduction to Pipelining Rabi Mahapatra Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley)
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
CS1104: Computer Organisation School of Computing National University of Singapore.
Computer Science Education
B 0000 Pipelining ENGR xD52 Eric VanWyk Fall
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.
Computer Organization CS224 Fall 2012 Lesson 28. Pipelining Analogy  Pipelined laundry: overlapping execution l Parallelism improves performance §4.5.
Pipelining (I). Pipelining Example  Laundry Example  Four students have one load of clothes each to wash, dry, fold, and put away  Washer takes 30.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.
Analogy: Gotta Do Laundry
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Pipelining Example Laundry Example: Three Stages
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
CS252/Patterson Lec 1.1 1/17/01 معماري کامپيوتر - درس نهم pipeline برگرفته از درس : Prof. David A. Patterson.
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
Introduction to Computer Organization Pipelining.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Lecture 5. MIPS Processor Design Pipelined MIPS #1 Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212 Computer Architecture.
Lecture 18: Pipelining I.
Pipelines An overview of pipelining
Pipelining Chapter 6.
Morgan Kaufmann Publishers
Single Clock Datapath With Control
Pipeline Implementation (4.6)
ECE232: Hardware Organization and Design
Morgan Kaufmann Publishers The Processor
Chapter 4 The Processor Part 2
Morgan Kaufmann Publishers Enhancing Performance with Pipelining
Lecturer: Alan Christopher
An Introduction to pipelining
Chapter 8. Pipelining.
Pipelining Appendix A and Chapter 3.
Morgan Kaufmann Publishers The Processor
Guest Lecturer: Justin Hsia
Presentation transcript:

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System Education & Research

Korea Univ Processor Performance Single-cycle processor performance is limited by the long critical path delay  The critical path delay limits the operating clock frequency Can we do better?  New semiconductor technology will reduce the delay of transistor, resulting in reducing the critical path delay Core 2 Duo is manufactured with 65nm technology Core i7 is manufactured with 45nm technology Next semiconductor technology is 32nm technology  Can we increase the processor performance with a different CPU architecture? Yes! Pipelining 2

Korea Univ 3 Pipelining: It’s Natural! Laundry Example  Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold  Washer takes 30 minutes  Dryer takes 40 minutes  “Folder” takes 20 minutes ABCD

Korea Univ 4 Sequential Laundry Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take? ABCD PM Midnight Time Task Order

Korea Univ 5 Pipelined Laundry: Why Wait? Pipelined laundry takes 3.5 hours for 4 loads ABCD 6 PM Midnight Time Pipelining Lessons Pipelining does not help latency of a single task, it helps throughput of entire workload Multiple tasks are operating simultaneously Pipeline efficiency is limited by slowest pipeline stage Potential speedup = Number of pipeline stages Unbalanced lengths of pipe stages reduces speedup Task Order

Korea Univ Pipelining Improve performance by increasing instruction throughput 6 Instruction Class Instruction Fetch Register Read ALU Operation Data Access Register Write Total Time Load word2ns1ns2ns 1ns8ns Store word2ns1ns2ns 7ns R-format2ns1ns2ns1ns6ns Branch2ns1ns2ns5ns Sequential Execution Pipelined Execution

Korea Univ Pipelining (Cont.) 7 Multiple instructions are being executed simultaneously Pipeline Speedup If all stages are balanced (meaning that each stage takes the same time) If not balanced, speedup is less Speedup comes from increased throughput, but the latency of instruction (time to execute each instruction) does not decrease = Time to execute an instruction sequential Number of stages Time to execute an instruction pipeline

Korea Univ Pipelining and ISA Design MIPS ISA is designed for pipelining  All instructions are 32-bits (4 bytes) Compared with x86 (CISC): 1- to 17-byte instructions  Regular instruction formats Can decode and read registers in one cycle  Load/store addressing Calculate address in 3 rd stage Access memory in 4 th stage  Alignment of memory operands in memory Memory access takes only one cycle For example, 32-bit data (word) is aligned at word address  0x…0, 0x…4, 0x…8, 0x…C 8

Korea Univ Basic Idea 9 What do we have to add to actually split the datapath into stages?

Korea Univ Basic Idea 10 F/F clock

Korea Univ Graphically Representing Pipelines Shading indicates the unit is being used by the instruction Shading on the right half of the register file (ID or WB) or memory means the element is being read in that stage Shading on the left half means the element is being written in that stage 11 IFID MEM WB EX Time lw IFID MEM WB EX add

Korea Univ Hazards It would be happy if we split the datapath into stages and the CPU works just fine  But, things are not that simple as you may expect  There are hazards! Situations that prevent starting the next instruction in the next cycle  Structure hazards Conflict over the use of a resource at the same time  Data hazard Data is not ready for the subsequent dependent instruction  Control hazard Fetching the next instruction depends on the previous branch outcome 12

Korea Univ Structure Hazards Conflict over the use of a resource at the same time Suppose the MIPS CPU with a single memory  Load/store requires data access in MEM stage  Instruction fetch requires instruction access from the same memory Instruction fetch would have to stall for that cycle Would cause a pipeline “bubble” Hence, pipelined datapaths require separate instruction and data memories  Or separate instruction and data caches 13 Unified Memory MIPS CPU Address Bus Data Bus Instruction Memory MIPS CPU Address Bus Data Bus Data Memory Address Bus Data Bus

Korea Univ Structure Hazards (Cont.) Time IFID MEM WB EX IFID MEM WB EX IFID MEM WB EX IFID MEM WB EX lw add sub add Need to separate instruction and data memory

Korea Univ Data Hazards Data is not ready for the subsequent dependent instruction 15 IFID MEM WB EX IFID MEM WB EX add $s0,$t0,$t1 Bubble sub $t2,$s0,$t3 Bubble To solve the data hazard problem, the pipeline needs to be stalled (typically referred to as “bubble”) Then, performance is penalized A better solution? Forwarding (or Bypassing)

Korea Univ Reducing Data Hazard - Forwarding 16 IFID MEM WB EX IF Bubble ID MEM WB EX add $s0,$t0,$t1 sub $t2,$s0,$t3

Korea Univ Data Hazard – Load-Use Case Can’t always avoid stalls by forwarding  Can’t forward backward in time! 17 IFID MEM WB EX IFID MEM WB EX lw $s0, 8($t1) Bubble sub $t2,$s0,$t3 This bubble can be hidden by proper instruction scheduling Hardware interlock is needed for the pipeline stall

Korea Univ Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction C-code: A = B + E; C = B + F;  B is loaded to $t1  E is loaded to $t2  F is loaded to $t4 18 lw$t1, 0($t0) lw$t2, 4($t0) add$t3, $t1, $t2 sw$t3, 12($t0) lw$t4, 8($t0) add$t5, $t1, $t4 sw$t5, 16($t0) 13 cycles stall 11 cycles lw$t1, 0($t0) lw$t2, 4($t0) lw$t4, 8($t0) add$t3, $t1, $t2 sw$t3, 12($t0) add$t5, $t1, $t4 sw$t5, 16($t0)

Korea Univ Control Hazard Branch determines the flow of instructions Fetching next instruction depends on branch outcome  Pipeline can’t always fetch correct instruction  Branch instruction is still working on ID stage when fetching the next instruction 19 IFID MEM WB EX beq $1,$2,L1 Taken target address is known here IFID MEM WB EX Bubble add $1,$2,$3 sw $1, 4($2) L1: sub $1,$2, $3 IFID MEM WB EX IFID MEM WB EX Actual condition is generated here Fetch instruction based on the comparison result …

Korea Univ Reducing Control Hazard In the MIPS pipeline, compare registers and compute target early in the pipeline  Add hardware to do it in ID stage 20 IFID MEM WB EX beq $1,$2,L1 Taken target address is known here IFID MEM WB EX Bubble add $1,$2,$3 L1: sub $1,$2, $3 IFID MEM WB EX Actual condition is generated here Fetch instruction based on the comparison result It reduces to 1 bubble from 2 or more bubbles But, it implies additional forwarding and hazard detection hardware – why? …

Korea Univ Delay Slot Branch instructions entail a “delay slot”  Delayed branch always executes the next sequential instruction, with the branch taking place after that one instruction delay  Delay slot is the slot right after a delayed branch instruction 21 IFID MEM WB EX beq $1,$2,L1 Taken target address is known here IFID MEM WB EX add $1,$2,$3 L1: sub $1,$2, $3 IFID MEM WB EX Actual condition is generated here Fetch instruction based on the comparison result (delay slot) …

Korea Univ Delay Slot (Cont.) Compiler needs to schedule a useful instruction in the delay slot, or fills it up with nop (no operation) 22 add $s1, $s2, $s3 bne $t0, $zero, L1 nop // delay slot addi $t1, $t1, 1 L1: addi $t1, $t1, 2 bne $t0, $zero, L1 add $s1, $s2, $s3 // delay slot addi $t1, $t1, 1 L1: addi $t1, $t1, 2 // $s1 = a, $s2 = b, $3 = c // $t0 = d, $t1 = f a = b + c; if (d == 0) { f = f + 1; } f = f + 2; Can we do better? Fill the delay slot with a useful and valid instruction

Korea Univ Branch Prediction Longer pipelines (implemented in Core 2 Duo, for example) can’t readily determine branch outcome early  Stall penalty becomes unacceptable since branch instructions are used so frequently in the program Solution: Branch Prediction  Predict the branch outcome in hardware  Flush the instructions (that shouldn’t have been executed) in the pipeline if the prediction turns out to be wrong  Modern processors use sophisticated branch predictors In MIPS pipeline, hardware can predict branches-not- taken and fetch instruction after branch with no delay  If the prediction turns out to be wrong, flush out the instruction being executed 23

Korea Univ MIPS with Predict-Not-Taken 24 Prediction correct Prediction incorrect Flush the instruction that shouldn’t be executed

Korea Univ Pipeline Summary Pipelining improves performance by increasing instruction throughput  Executes multiple instructions in parallel Pipelining is subject to hazards  Structure, data, control hazards Instruction set design affects the complexity of the pipeline implementation 25