Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Instruction-Level Parallelism (ILP)
1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University RISC Pipeline See: P&H Chapter 4.6.
Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.7.
© Kavita Bala, Computer Science, Cornell University Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipelining See: P&H Chapter 4.5.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
EECS 470 Pipeline Hazards Lecture 4 Coverage: Appendix A.
Computer Architecture Lecture 3 Coverage: Appendix A
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Appendix A Pipelining: Basic and Intermediate Concepts
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
EECS 470 Further review: Pipeline Hazards and More Lecture 2 – Winter 2014 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti,
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
B10001 Pipelining Hazards ENGR xD52 Eric VanWyk Fall 2012.
Pipelined Datapath and Control
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
MIPS Pipeline Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapter 4.6.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
Computer Organization CS224 Chapter 4 Part b The Processor Spring 2010 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Sample Code (Simple) Run the following code on a pipelined datapath: add1 2 3 ; reg 3 = reg 1 + reg 2 nand ; reg 6 = reg 4 & reg 5 lw ; reg.
Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
CDA 5155 Computer Architecture Week 1.5. Start with the materials: Conductors and Insulators Conductor: a material that permits electrical current to.
Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University MIPS Pipeline See P&H Chapter 4.6.
Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
2/15/02CSE Data Hazzards Data Hazards in the Pipelined Implementation.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University MIPS Pipeline See P&H Chapter 4.6.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
Chapter Six.
Computer Organization
Morgan Kaufmann Publishers
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Pipelining review.
Pipelining in more detail
Chapter Six.
The Processor Lecture 3.6: Control Hazards
Chapter Six.
Prof. Sirer CS 316 Cornell University
Morgan Kaufmann Publishers The Processor
Introduction to Computer Organization and Architecture
Pipelining Hakim Weatherspoon CS 3410 Computer Science
Guest Lecturer: Justin Hsia
MIPS Pipeline Hakim Weatherspoon CS 3410, Spring 2013 Computer Science
Presentation transcript:

Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

© Kavita Bala, Computer Science, Cornell University Basic Pipelining Five stage “RISC” load-store architecture 1.Instruction fetch (IF) get instruction from memory 2.Instruction Decode (ID) translate opcode into control signals and read regs 3.Execute (EX) perform ALU operation 4.Memory (MEM) Access memory if load/store 5.Writeback (WB) update register file Following slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Sample Code (Simple) Assume eight-register machine Run the following code on a pipelined datapath add ; reg 3 = reg 1 + reg 2 nand ; reg 6 = ~(reg 4 & reg 5) lw 4 20 (2) ; reg 4 = Mem[reg2+20] add ; reg 5 = reg 2 + reg 5 sw 7 12(3) ; Mem[reg3+12] = reg 7 Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits op dest offset valB valA PC+1 target ALU result op dest valB op dest ALU result mdata instruction 0 R2 R3 R4 R5 R1 R6 R0 R7 regA regB Bits data dest Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Time Graphs Time: add nand lw add sw fetch decode execute memory writeback Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Pipelining Recap Powerful technique for masking latencies –Logically, instructions execute one at a time –Physically, instructions execute in parallel  Instruction level parallelism Decouples the processor model from the implementation –Interface vs. implementation BUT dependencies between instructions complicate the implementation

© Kavita Bala, Computer Science, Cornell University What Can Go Wrong? Data hazards –register reads occur in stage 2 –register writes occur in stage 5 –could read the wrong value if is about to be written Control hazards –branch instruction may change the PC in stage 4 –what do we fetch before that?

© Kavita Bala, Computer Science, Cornell University Handling Data Hazards Detect and Stall –If hazards exist, stall the processor until they go away –Safe, but not great for performance Detect and Forward –If hazards exist, fix up the pipeline to get the correct value (if possible) –Most common solution for high performance

© Kavita Bala, Computer Science, Cornell University Handling Data Hazards III: Detect –dest of early instrs == source of current Forward: –New bypass datapaths route computed data to where it is needed –New MUX and control to pick the right data Beware: Stalling may still be required even in the presence of forwarding

© Kavita Bala, Computer Science, Cornell University Different type of Hazards Use register value in subsequent instruction add3 1 2 or nand sub Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 nand time fetch decode execute memory writeback add nand If not careful, you read the wrong value of R3

© Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 nand time fetch decode execute memory writeback add nand If not careful, you read the wrong value of R3

© Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 nand time fetch decode execute memory writeback add nand If not careful, you read the wrong value of R3

© Kavita Bala, Computer Science, Cornell University Handling Data Hazards: Forwarding No point forwarding to decode Forward to EX stage From output of ALU and MEM stages

© Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 nand time fetch decode execute memory writeback add nand

© Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 and nand time fetch decode execute memory writeback add fetch decode execute memory writeback nand

© Kavita Bala, Computer Science, Cornell University Forwarding Illustration I n s t r. O r d e r add $3 add nand $5 $3 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg

© Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 and and nand time fetch decode execute memory writeback add fetch decode execute memory writeback nand fetch decode execute memory writeback add Write in first half of cycle, read in second half of cycle

© Kavita Bala, Computer Science, Cornell University Data Hazards: lw lw3 12(zero) and nand time fetch decode execute memory writeback lw add fetch decode execute memory writeback nand

© Kavita Bala, Computer Science, Cornell University Data Hazards: lw lw3 12(zero) and nand time fetch decode execute memory writeback lw add fetch decode execute memory writeback nand

© Kavita Bala, Computer Science, Cornell University Data Hazards: lw lw3 12(zero) and nand time fetch decode execute memory writeback lw fetch decode execute memory writeback nand

© Kavita Bala, Computer Science, Cornell University A tricky case add add add I n s t r. O r d e r add ALU IM Reg DMReg add add ALU IM Reg DMReg ALU IM Reg DMReg

© Kavita Bala, Computer Science, Cornell University Another tricky case add0 4 1 add 3 0 4

© Kavita Bala, Computer Science, Cornell University Handling Data Hazards: Summary Forward: –New bypass datapaths route computed data to where it is needed Beware: Stalling may still be required even in the presence of forwarding For lw –MIPS 2000/3000: a delay slot –MIPS 4000 onwards: stall  But really rely on compiler to treat it like delay slot

© Kavita Bala, Computer Science, Cornell University Detection Detection: –Compare regA with previous DestRegs –Compare regB with previous DestRegs –regA and regB going to ALU –Previous destregs  Previously in ALU, Memory and Writeback

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX op dest offset valB valA PC+1 target ALU result op dest valB op dest ALU result mdata instruction 0 R2 R3 R4 R5 R1 R6 R0 R7 regA regB data dest Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Sample Code Which data hazards do you see? add nand add lw 6 24(3) sw 6 12(2) Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Sample Code Which data hazards do you see? add nand add lw 6 24(3) sw 6 12(2) Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University Hazard PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX add nand R2 R3 R4 R5 R1 R6 R0 R7 regA regB data 3 fwd 3 First half of cycle 3 Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX nand add add R2 R3 R4 R5 R1 R6 R0 R7 regA regB 5 data H1 3 End of cycle 3 Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University New Hazard PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX nand add add R2 R3 R4 R5 R1 R6 R0 R7 regA regB 5 data 3 MUXMUX H1 3 First half of cycle Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX add nand add 21 lw 6 24(3) R2 R3 R4 R5 R1 R6 R0 R7 regA regB 753 data MUXMUX H2H1 End of cycle 4 Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX add nand add 21 lw 6 24(3) R2 R3 R4 R5 R1 R6 R0 R7 regA regB 753 data MUXMUX H2H1 First half of cycle Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX lw add nand -2 sw 6 12(2) R2 R3 R4 R5 R1 R6 R0 R7 regA regB 75 data MUXMUX H2H1 6 End of cycle 5 Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX lw add nand -2 sw 6 12(2) R2 R3 R4 R5 R1 R6 R0 R7 regA regB 675 data MUXMUX H2H1 First half of cycle 6 Hazard 6 en L Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX 5 31 lw add 22 sw 6 12(2) R2 R3 R4 R5 R1 R6 R0 R7 regA regB 67 data MUXMUX H2 End of cycle 6 nop Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX nop 5 31 lw add 22 sw 6 12(2) R2 R3 R4 R5 R1 R6 R0 R7 regA regB 67 data MUXMUX H2 First half of cycle 7 Hazard 6 Slides thanks to Sally McKee

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX sw nop lw R2 R3 R4 R5 R1 R6 R0 R7 regA regB 6 data MUXMUX H3 End of cycle 7 Slides thanks to Sally McKee 6

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX sw nop lw R2 R3 R4 R5 R1 R6 R0 R7 regA regB data MUXMUX H3 First half of cycle Slides thanks to Sally McKee 6 99

© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX sw nop R2 R3 R4 R5 R1 R6 R0 R7 regA regB data MUXMUX H3 End of cycle 8 Slides thanks to Sally McKee 6 99

© Kavita Bala, Computer Science, Cornell University Control Hazards beqz3 goToZero nand add time fetch decode execute memory writeback beqz fetch decode execute memory writeback nand

© Kavita Bala, Computer Science, Cornell University Control Hazards for 4-stage pipeline beqz3 goToZero nand add time F/D execute memory writeback beqz F/D execute memory writeback nand

© Kavita Bala, Computer Science, Cornell University Control Hazards Stall –Inject NOPs into the pipeline when the next instruction is not known –Pros: simple, clean; Cons: slow Delay Slots –Tell the programmer that the N instructions after a jump will always be executed, no matter what the outcome of the branch –Pros: The compiler may be able to fill the slots with useful instructions; Cons: breaks abstraction boundary Speculative Execution –Insert instructions into the pipeline –Replace instructions with NOPs if the branch comes out opposite of what the processor expected –Pros: Clean model, fast; Cons: complex

© Kavita Bala, Computer Science, Cornell University Control Hazards for 4-stage pipeline beqz3 goToZero nand add time F/D execute memory writeback beqz F/D execute memory writeback nand goToZero: lui 2 1 F/D execute memory writeback lui Delay slot!

© Kavita Bala, Computer Science, Cornell University Hazard summary Data hazards –Forward –Stall for lw/sw Control hazard –Delay slot

© Kavita Bala, Computer Science, Cornell University Pipelining for Other Circuits Pipelining can be applied to any combinational logic circuit What’s the circuit latency at 2ns. per gate? What’s the throughput? What is the stage breakdown? Where would you place latches? What’s the throughput after pipelining?