Pipeline Control Hazards and Instruction Variations

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
ELEN 468 Advanced Logic Design
1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
Pipeline Control Hazards and Instruction Variations Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.8 &
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University RISC Pipeline See: P&H Chapter 4.6.
Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.7.
Pipeline Control Hazards and Instruction Variations Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.8 &
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
MIPS Pipeline Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapter 4.6.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University MIPS Pipeline See P&H Chapter 4.6.
Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University MIPS Pipeline See P&H Chapter 4.6.
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
EECS 322 April 10, 2000 EECS 322 Computer Architecture Pipeline Control, Data Hazards and Branch Hazards.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Pipeline Control Hazards Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H Appendix 4.8.
Computer Organization
Stalling delays the entire pipeline
CDA 3101 Spring 2016 Introduction to Computer Organization
Pipelining Chapter 6.
CSCI206 - Computer Organization & Programming
Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2012
Morgan Kaufmann Publishers
Pipeline Control Hazards and Instruction Variations
Pipeline Control Hazards
ELEN 468 Advanced Logic Design
5 Steps of MIPS Datapath Figure A.2, Page A-8
Single Clock Datapath With Control
Appendix C Pipeline implementation
ECS 154B Computer Architecture II Spring 2009
CDA 3101 Spring 2016 Introduction to Computer Organization
Hakim Weatherspoon CS 3410 Computer Science Cornell University
Hakim Weatherspoon CS 3410 Computer Science Cornell University
Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
Morgan Kaufmann Publishers The Processor
Pipelining review.
Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2013
Pipelining Chapter 6.
Pipelining and Hazards
Data and Control Hazards
Pipelining in more detail
CSCI206 - Computer Organization & Programming
Data Hazards Data Hazard
Pipeline control unit (highly abstracted)
The Processor Lecture 3.6: Control Hazards
Control unit extension for data hazards
The Processor Lecture 3.5: Data Hazards
Pipeline control unit (highly abstracted)
Pipeline Control unit (highly abstracted)
Pipelining (II).
Control unit extension for data hazards
Morgan Kaufmann Publishers The Processor
Introduction to Computer Organization and Architecture
Control unit extension for data hazards
Pipelining Hakim Weatherspoon CS 3410 Computer Science
Pipelining Hakim Weatherspoon CS 3410 Computer Science
Pipelining - 1.
Stalls and flushes Last time, we discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
MIPS Pipeline Hakim Weatherspoon CS 3410, Spring 2013 Computer Science
MIPS Pipelined Datapath
©2003 Craig Zilles (derived from slides by Howard Huang)
ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.
Presentation transcript:

Pipeline Control Hazards and Instruction Variations Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H Appendix 4.8

Goals for Today Recap: Data Hazards Control Hazards What is the next instruction to execute if a branch is taken? Not taken? How to resolve control hazards Optimizations Next time: Instruction Variations Instruction Set Architecture Variations ARM X86 RISC vs CISC The Assembler

Recall: MIPS instruction formats All MIPS instructions are 32 bits long, has 3 formats R-type I-type J-type op rs rt rd shamt func 6 bits 5 bits op rs rt immediate 6 bits 5 bits 16 bits op immediate (target address) 6 bits 26 bits

Recall: MIPS Instruction Types Arithmetic/Logical R-type: result and two source registers, shift amount I-type: 16-bit immediate with sign/zero extension Memory Access load/store between registers and memory word, half-word and byte operations Control flow conditional branches: pc-relative addresses jumps: fixed offsets, register absolute

Data Hazards inst mem inst OP B A D M Rd mem PC IF/ID ID/EX EX/MEM Ra Rb D B A inst PC+4 OP B A Rt D M imm Rd addr din dout +4 mem detect hazard PC Rd forward unit IF/ID ID/EX EX/MEM MEM/WB

Resolving Data Hazards What to do if data hazard detected Stall Reorder instructions in SW Forward/Bypass prev slide: Q: what happens if branch in EX? A: bad stuff: we miss the branch So: lets try not to stall

Stalling add r3, r1, r2 sub r5, r3, r5 or r6, r3, r4 add r6, r3, r8 Clock cycle 1 2 3 4 5 6 7 8 add r3, r1, r2 sub r5, r3, r5 or r6, r3, r4 add r6, r3, r8 IF ID 11=r1 22=r2 EX D=33 MEM D=33 WB r3=33 ID ?=r3 33=r3 EX MEM M find hazards first, then draw

Stalling nop /stall A A D D inst mem D rD B B data mem rA rB B M inst +4 Rd Rd Rd WE PC WE WE nop Op Op Op draw control signals for /stall Q: what happens if branch in EX? A: bad stuff: we miss the branch So: lets try not to stall /stall

Forwarding add r3, r1, r2 sub r5, r3, r5 or r6, r3, r4 add r6, r3, r8 Clock cycle 1 2 3 4 5 6 7 8 add r3, r1, r2 sub r5, r3, r5 or r6, r3, r4 add r6, r3, r8 IF ID 11=r1 22=r2 EX D=33 MEM D=33 WB r3=33 ID 33=r3 EX MEM ID 33=r3

Forwarding Datapath D B A A D D inst mem B data mem B M Rd Rd Rb WE WE Ra draw control signals MC MC

“earlier” = started earlier Forwarding Datapath MEM to EX Bypass EX needs ALU result that is still in MEM stage Resolve: Add a bypass from EX/MEM.D to start of EX How to detect? Logic in Ex Stage: forward = (Ex/M.WE && EX/M.Rd != 0 && ID/Ex.Ra == Ex/M.Rd) || (same for rB) destination reg of earlier instruction == source reg of current stage left “earlier” = started earlier = stage right assumes (WE=0 implies rD=0) everywhere, similar for rA needed, rB needed (can ensure this in ID stage)

“earlier” = started earlier Forwarding Datapath WB to EX Bypass EX needs value being written by WB Resolve: Add bypass from WB final value to start of EX How to detect? Logic in Ex Stage: forward = (M/WB.WE && M/WB.Rd != 0 && ID/Ex.Ra == M/WB.Rd && not (ID/Ex.WE && Ex/M.Rd != 0 && ID/Ex.Ra == Ex/M.Rd) || (same for rB) destination reg of earlier instruction == source reg of current stage left “earlier” = started earlier = stage right assumes (WE=0 implies rD=0) everywhere, similar for rA needed, rB needed (can ensure this in ID stage)

“earlier” = started earlier Forwarding Datapath Register File Bypass Reading a value that is currently being written Detect: ((Ra == MEM/WB.Rd) or (Rb == MEM/WB.Rd)) and (WB is writing a register) Resolve: Add a bypass around register file (WB to ID) Better Soln: (Hack) just negate register file clock writes happen at end of first half of each clock cycle reads happen during second half of each clock cycle destination reg of earlier instruction == source reg of current stage left “earlier” = started earlier = stage right assumes (WE=0 implies rD=0) everywhere, similar for rA needed, rB needed (can ensure this in ID stage)

Quiz 2 add r3, r1, r2 nand r5, r3, r4 add r2, r6, r3 lw r6, 24(r3) sw r6, 12(r2) Find all hazards, and say how they are resolved

Memory Load Data Hazard B A inst mem data mem lw r4, 20(r8) sub r6, r4, r1 requires one bubble

Resolving Memory Load Hazard Load Data Hazard Value not available until WB stage So: next instruction can’t proceed if hazard detected Resolution: MIPS 2000/3000: one delay slot ISA says results of loads are not available until one cycle later Assembler inserts nop, or reorders to fill delay slot MIPS 4000 onwards: stall But really, programmer/compiler reorders to avoid stalling in the load delay slot For stall, how to detect? Logic in ID Stage Stall = ID/Ex.MemRead && (IF/ID.Ra == ID/Ex.Rd || IF/ID.Rb == ID/Ex.Rd) often see lw …; nop; … we will stall for our project 2

Data Hazard Recap Delay Slot(s) Stall Forward/Bypass Modify ISA to match implementation Stall Pause current and all subsequent instructions Forward/Bypass Try to steal correct value from elsewhere in pipeline Otherwise, fall back to stalling or require a delay slot 1: ISA tied to impl; messy asm; impl easy & cheap; perf depends on asm. 2: ISA correctness not tied to impl; clean+slow asm, or messy+fast asm; impl easy and cheap; same perf as 1. 3: ISA perf not tied to impl; clean+fast asm; impl is tricky; best perf

Administrivia Prelim1: today Tuesday, February 28th in evening Location: GSH132: Goldwin Smith Hall room 132 Time: We will start at 7:30pm sharp, so come early Closed Book: NO NOTES, BOOK, CALCULATOR, CELL PHONE Cannot use electronic device or outside material Practice prelims are online in CMS Material covered everything up to end of last week Appendix C (logic, gates, FSMs, memory, ALUs) Chapter 4 (pipelined [and non-pipeline] MIPS processor with hazards) Chapters 2 (Numbers / Arithmetic, simple MIPS instructions) Chapter 1 (Performance) HW1, HW2, Lab0, Lab1, Lab2

Administrivia Online Survey results More chairs in lab sections Better synchronization between lecture and homework Lab and lecture may be a bit out of sync at times Project1 (PA1) due next Monday, March 5th Continue working diligently. Use design doc momentum Save your work! Save often. Verify file is non-zero. Periodically save to Dropbox, email. Beware of MacOSX 10.5 (leopard) and 10.6 (snow-leopard) Use your resources Lab Section, Piazza.com, Office Hours, Homework Help Session, Class notes, book, Sections, CSUGLab

Control Hazards What about branches? Can we forward/bypass values for branches? We can move branch calc from EX to ID will require new bypasses into ID stage; or can just zap the second instruction What happens to instructions following a branch, if branch taken? Need to zap/flush instructions Is there still a performance penalty for branches Yes, need to stall, then may need to zap (flush) subsequent instuctions that have already been fetched. Q: what happens with branch/jump in branch delay slot? A: bad stuff, so forbidden Q: why one, not two? A: can move branch calc from EX to ID; will require new bypasses into ID stage; or can just zap the second instruction Q: performance? A: stall is bad

Control Hazards inst mem A D beq r1, r2, L B data mem add r3, r0, r3 +4 data mem PC beq r1, r2, L add r3, r0, r3 sub r5, r4, r6 L: or r3, r2, r4 branch decision in EX need 2 stalls, and maybe kill

Control Hazards inst mem A D beq r1, r2, L B data mem add r3, r0, r3 +4 data mem PC beq r1, r2, L add r3, r0, r3 sub r5, r4, r6 L: or r3, r2, r4 branch decision in EX need 2 stalls, and maybe kill

Control Hazards Control Hazards instructions are fetched in stage 1 (IF) branch and jump decisions occur in stage 3 (EX) i.e. next PC is not known until 2 cycles after branch/jump Q: what happens with branch/jump in branch delay slot? A: bad stuff, so forbidden Q: why one, not two? A: can move branch calc from EX to ID; will require new bypasses into ID stage; or can just zap the second instruction Q: performance? A: stall is bad

Control Hazards Control Hazards Delay Slot Stall (+ Zap) instructions are fetched in stage 1 (IF) branch and jump decisions occur in stage 3 (EX) i.e. next PC is not known until 2 cycles after branch/jump Delay Slot ISA says N instructions after branch/jump always executed MIPS has 1 branch delay slot Stall (+ Zap) prevent PC update clear IF/ID pipeline register instruction just fetched might be wrong one, so convert to nop allow branch to continue into EX stage Q: what happens with branch/jump in branch delay slot? A: bad stuff, so forbidden Q: why one, not two? A: can move branch calc from EX to ID; will require new bypasses into ID stage; or can just zap the second instruction Q: performance? A: stall is bad

Delay Slot inst mem A D beq r1, r2, L B data mem ori r2, r0, 1 +4 data mem PC branch calc decide branch beq r1, r2, L ori r2, r0, 1 L: or r3, r1, r4

Control Hazards Control Hazards Stall Delay Slot Speculative Execution instructions are fetched in stage 1 (IF) branch and jump decisions occur in stage 3 (EX) i.e. next PC not known until 2 cycles after branch/jump Stall Delay Slot Speculative Execution “Guess” direction of the branch Allow instructions to move through pipeline Zap them later if wrong guess Useful for long pipelines Q: How to guess? A: constant; hint; branch prediction; random; …

Loops

Branch Prediction

Branch Prediction

Pipelining: What Could Possibly Go Wrong? Data hazards register file reads occur in stage 2 (IF) register file writes occur in stage 5 (WB) next instructions may read values soon to be written Control hazards branch instruction may change the PC in stage 3 (EX) next instructions have already started executing Structural hazards resource contention so far: impossible because of ISA and pipeline design Q: what if we had a “copy 0(r3), 4(r3)” instruction, as the x86 does, or “add r4, r4, 0(r3)”? A: structural hazard