CS203 – Advanced Computer Architecture

Slides:



Advertisements
Similar presentations
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Advertisements

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1 Lecture 3 Pipeline Contd. (Appendix A) Instructor: L.N. Bhuyan CS 203A Advanced Computer Architecture Some slides are adapted from Roth.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Chapter Six 1.
Pipelining - Hazards.
ECE 445 – Computer Organization
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.
Computer Organization CS224 Fall 2012 Lesson 28. Pipelining Analogy  Pipelined laundry: overlapping execution l Parallelism improves performance §4.5.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
CMPE 421 Parallel Computer Architecture
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Branch Hazards and Static Branch Prediction Techniques
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Pipelining Example Laundry Example: Three Stages
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
CS203 – Advanced Computer Architecture Pipelining Review.
Chapter Six.
CS 352H: Computer Systems Architecture
Computer Organization CS224
Pipelining Chapter 6.
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers The Processor
5 Steps of MIPS Datapath Figure A.2, Page A-8
Instructor: Justin Hsia
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Appendix C Pipeline implementation
Chapter 4 The Processor Part 4
CDA 3101 Spring 2016 Introduction to Computer Organization
\course\cpeg323-08F\Topic6b-323
Morgan Kaufmann Publishers The Processor
Appendix A - Pipelining
Chapter 4 The Processor Part 3
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
Pipelining review.
Pipelining Chapter 6.
The processor: Pipelining and Branching
Morgan Kaufmann Publishers Enhancing Performance with Pipelining
CSC 4250 Computer Architectures
\course\cpeg323-05F\Topic6b-323
Pipeline control unit (highly abstracted)
Chapter Six.
The Processor Lecture 3.6: Control Hazards
Chapter Six.
November 5 No exam results today. 9 Classes to go!
Instruction Execution Cycle
Pipeline control unit (highly abstracted)
CSC3050 – Computer Architecture
Pipeline Control unit (highly abstracted)
Pipelining Chapter 6.
Morgan Kaufmann Publishers The Processor
Pipelining Chapter 6.
Guest Lecturer: Justin Hsia
CS161 – Design and Architecture of Computer Systems
Pipelining Hazards.
Presentation transcript:

CS203 – Advanced Computer Architecture Pipelining Review

Pipelining Analogy Laundry steps: Wash Dry Fold Put it away – Closet / Dresser / Neat pile on floor Pipelining Review

Pipelining Analogy Assuming each step take 1 hour, 4 loads would take 16 hours! 1 2 3 4 Pipelining Review

Pipelining Analogy To speed things up, overlap steps 4 loads of laundry now only takes 7 hours! 1 2 3 4 Pipelining Review

Morgan Kaufmann Publishers 25 February, 2019 Speedup of Pipelining k stages pipeline, t time per stage, n jobs Non-pipelined time = n*k*t Pipelined time = (k+n-1)*t This is an ideal case: No job depends on a previous job All jobs behave exactly the same Not realistic. Pipelining Review Chapter 4 — The Processor

Simple 5 stage pipeline Pipelining Review

Morgan Kaufmann Publishers 25 February, 2019 MIPS Pipeline Five stages, one step per stage IF: Instruction fetch from memory ID: Instruction decode & register read EX: Execute operation or calculate address MEM: Access memory operand WB: Write result back to register Pipelining Review Chapter 4 — The Processor

Morgan Kaufmann Publishers 25 February, 2019 Pipeline Performance Single-cycle (Tc= 800ps) Pipelined (Tc= 200ps) Pipelining Review Chapter 4 — The Processor

Morgan Kaufmann Publishers 25 February, 2019 Pipeline registers Need registers between stages To hold information produced in previous cycle Chapter 4 — The Processor

Pipelined Control (Simplified) Morgan Kaufmann Publishers 25 February, 2019 Pipelined Control (Simplified) Chapter 4 — The Processor

Datapath with Hazard Detection Morgan Kaufmann Publishers 25 February, 2019 Datapath with Hazard Detection Chapter 4 — The Processor

Multi-Cycle Pipeline Diagram Morgan Kaufmann Publishers 25 February, 2019 Multi-Cycle Pipeline Diagram Form showing resource usage Chapter 4 — The Processor

Multi-Cycle Pipeline Diagram Morgan Kaufmann Publishers 25 February, 2019 Multi-Cycle Pipeline Diagram Traditional form Chapter 4 — The Processor

Pipelining What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores What makes it hard? structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction What makes it really hard: exception handling trying to improve performance with out-of-order execution, etc. Pipelining Review

hazards Pipelining Review

Morgan Kaufmann Publishers 25 February, 2019 Hazards Situations that prevent starting the next instruction in the next cycle Structure hazards A required resource is busy Data hazard Need to wait for previous instruction to complete its data read/write Control hazard Deciding on control action depends on previous instruction Pipelining Review Chapter 4 — The Processor

Morgan Kaufmann Publishers 25 February, 2019 Structure Hazards Conflict for use of a resource In MIPS pipeline with a single memory Load/store requires data access Instruction fetch would have to stall for that cycle Would cause a pipeline “bubble” Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches Pipelining Review Chapter 4 — The Processor

Structural hazard two memory accesses in cc4, use Harvard architecture separate data and code memories Pipelining Review

Data hazards Pipelining Review

Morgan Kaufmann Publishers 25 February, 2019 Data Hazards An instruction depends on completion of data access by a previous instruction add $s0, $t0, $t1 sub $t2, $s0, $t3 Assuming RF can read and write in the same cycle Pipelining Review Chapter 4 — The Processor

Types of data hazards Read After Write (RAW), true, or dataflow, dependence i1: add r1, r2, r3 i2: add r4, r1, r5 Write After Read (WAR), anti dependence i2: add r2, r4, r5 Write After Write (WAW), output dependence i2: add r1, r4, r5 Pipelining Review

WAR & WAW WAR & WAW are name dependencies Dependence is on the container’s name not on the value contained. Can be eliminated by renaming, static (in software) or dynamic (in hardware) WAW & WAR cannot occur in the 5-stage MIPS pipeline All the writing happens in WB stage, in issue order of instructions IF ID EX WB MEM Pipelining Review

Forwarding (aka Bypassing) Morgan Kaufmann Publishers 25 February, 2019 Forwarding (aka Bypassing) Use result when it is computed Don’t wait for it to be stored in a register Requires extra connections in the datapath Pipelining Review Chapter 4 — The Processor

Morgan Kaufmann Publishers 25 February, 2019 Load-Use Data Hazard Can’t always avoid stalls by forwarding If value not computed when needed Can’t forward backward in time! Chapter 4 — The Processor

Examples of Dependencies Pipelining Review

Dependencies & Forwarding Morgan Kaufmann Publishers 25 February, 2019 Dependencies & Forwarding Forwarding Chapter 4 — The Processor

Code Scheduling to Avoid Stalls Morgan Kaufmann Publishers 25 February, 2019 Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction C code for A = B + E; C = B + F; Reordering is commonly done by modern compilers lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) stall stall 13 cycles 11 cycles Chapter 4 — The Processor

Control hazards Pipelining Review

Morgan Kaufmann Publishers 25 February, 2019 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can’t always fetch correct instruction Still working on ID stage of branch In MIPS pipeline Need to compare registers and compute target early in the pipeline Add hardware to do it in ID stage Chapter 4 — The Processor

Morgan Kaufmann Publishers 25 February, 2019 Stall on Branch Simplest way to handle control hazard: stall Wait until branch outcome determined before fetching next instruction Predict Branch Instead of stalling and waiting for branch outcome, predict branch and execute If incorrect, flush pipeline and take correct path Chapter 4 — The Processor

Pipeline (Simplified) Morgan Kaufmann Publishers 25 February, 2019 Pipeline (Simplified) Chapter 4 — The Processor

Morgan Kaufmann Publishers 25 February, 2019 Branch Hazards If branch outcome resolved in MEM Flush these instructions (Set control values to 0) PC Chapter 4 — The Processor

Branches are resolved in EX stage Ex. Branch Not Taken, stall B = Branch instr. i = instr after Br. j = instr branch to IF ID EXE MEM WB Pipelining Review

Branches are resolved in EX stage Ex. Branch Not Taken, stall B = Branch instr. i = instr after Br. j = instr branch to IF ID EXE MEM WB i B i+1 i B Pipelining Review

Branches are resolved in EX stage Ex. Branch Taken, stall B = Branch instr. i = instr after Br. j = instr branch to IF ID EXE MEM WB Pipelining Review

Branches are resolved in EX stage Ex. Branch Taken B = Branch instr. i = instr after Br. j = instr branch to IF ID EXE MEM WB i B j B Pipelining Review

Control Hazards Branch problem: Solutions: branches are resolved in EX stage  2 cycles penalty on taken branches Ideal CPI =1. Assuming 2 cycles for all branches and 32% branch instructions  new CPI = 1 + 0.32*2 = 1.64 Solutions: Reduce branch penalty: change the datapath – new adder needed in ID stage. Fill branch delay slot(s) with useful instruction(s). Predict branch (Taken/Not Taken). Static branch prediction: same prediction for every instance of that branch Dynamic branch prediction: prediction based on path leading to that branch Pipelining Review

Pipeline (w/ early branch) Morgan Kaufmann Publishers 25 February, 2019 Pipeline (w/ early branch) Chapter 4 — The Processor

Branches are resolved in ID stage Ex. Branch Not Taken, stall B = Branch instr. i = instr after Br. j = instr branch to IF ID EXE MEM WB Pipelining Review

Branches are resolved in ID stage Ex. Branch Not Taken B = Branch instr. i = instr after Br. j = instr branch to IF ID EXE MEM WB i B i+1 i B Pipelining Review

Branches are resolved in ID stage Ex. Branch Taken, stall B = Branch instr. i = instr after Br. j = instr branch to IF ID EXE MEM WB Pipelining Review

Branches are resolved in ID stage Ex. Branch Taken B = Branch instr. i = instr after Br. j = instr branch to IF ID EXE MEM WB i B j B Pipelining Review

Filling branch delay slots Branch delay slot filling move a useful instruction into the slot right after the branch, hoping that its execution is necessary. Limitations: restrictions on which instructions can be rescheduled, compile time prediction of taken or untaken branches; serious impact on program semantics & future architectures. Pipelining Review

Scheduling Branch Delay Slots A. From before branch B. From branch target C. From fall through add $1,$2,$3 if $2=0 then add $1,$2,$3 if $1=0 then sub $4,$5,$6 delay slot delay slot add $1,$2,$3 if $1=0 then delay slot sub $4,$5,$6 becomes becomes becomes if $2=0 then add $1,$2,$3 add $1,$2,$3 if $1=0 then sub $4,$5,$6 add $1,$2,$3 if $1=0 then sub $4,$5,$6 Limitations on delayed-branch scheduling come from 1) restrictions on the instructions that can be moved/copied into the delay slot and 2) limited ability to predict at compile time whether a branch is likely to be taken or not. In B and C, the use of $1 prevents the add instruction from being moved to the delay slot In B the sub may need to be copied because it could be reached by another path. B is preferred when the branch is taken with high probability (such as loop branches Pipelining Review

Branch Prediction Predict the outcome of a branch in the IF stage Idea: doing something is better than waiting around doing nothing Gains might outweigh losses Heavily researched area in the last 20 years Fixed branch prediction. applied to all branch instructions indiscriminately. Predict not-taken (47% actually not taken): continue to fetch instruction without stalling;; do not change any state (no register write); if branch is taken turn the fetched instruction into no-op, restart fetch at target address: 1 cycle penalty. Assumes branch detection at the ID stage. Predict taken (53%): more difficult, must know target before branch is decoded; no advantage in our simple 5-stage pipeline even if we move the branch to ID stage. Pipelining Review

Branch Prediction Dynamic branch prediction Later in this course Static branch prediction. Opcode-based: prediction based on opcode itself and related condition. Examples: MC 88110, PowerPC 601/603. Displacement based prediction: if d < 0 (LOOP) predict taken, if d >= 0 predict not taken. Examples: Alpha 21064 (as option), PowerPC 601/603 for regular conditional branches. Compiler-directed prediction: compiler sets or clears a predict bit in the instruction itself. Examples: AT&T 9210 Hobbit, PowerPC 601/603 (predict bit reverses opcode or displacement predictions), HP PA 8000 (as option). Dynamic branch prediction Later in this course Pipelining Review