ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 6 Pipelining – Part 1 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

COMP381 by M. Hamdi 1 (Recap) Pipeline Hazards. COMP381 by M. Hamdi 2 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 8 Instruction-Level Parallelism – Part 1 Benjamin Lee Electrical and Computer Engineering Duke.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining - Hazards.
Instruction-Level Parallelism (ILP)
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Pipelining III Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture MIPS Review & Pipelining I Steve Ko Computer Sciences and Engineering University at Buffalo.
ECE 445 – Computer Organization
CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
January 31, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Asanovic/Devadas Spring Simple Instruction Pipelining Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
Pipelining. 10/19/ Outline 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion.
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 7 Pipelining – Part 2 Benjamin Lee Electrical and Computer Engineering Duke University
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
EECE 476: Computer Architecture Slide Set #5: Implementing Pipelining Tor Aamodt Slide background: Die photo of the MIPS R2000 (first commercial MIPS microprocessor)
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Introduction to Computer Organization Pipelining.
CMPE 421 REVIEW: MIDTERM 1. A MODIFIED FIVE-Stage Pipeline PC A Y R MD1 addr inst Inst Memory Imm Ext add rd1 GPRs rs1 rs2 ws wd rd2 we wdata addr wdata.
ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
EECS 322 April 10, 2000 EECS 322 Computer Architecture Pipeline Control, Data Hazards and Branch Hazards.
Computer Architecture and Parallel Computing 并行结构与计算 Lecture 2 - Pipelining Peng Liu Dept. of Info. Sci. & Elec. Engg. Zhejiang University
5 Steps of MIPS Datapath Figure A.2, Page A-8
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Peng Liu Lecture 7 Pipelining Peng Liu
CDA 3101 Spring 2016 Introduction to Computer Organization
\course\cpeg323-08F\Topic6b-323
CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining
CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining
Morgan Kaufmann Publishers The Processor
Electrical and Computer Engineering
\course\cpeg323-05F\Topic6b-323
Krste Asanovic Electrical Engineering and Computer Sciences
Pipeline control unit (highly abstracted)
The Processor Lecture 3.6: Control Hazards
The Processor Lecture 3.5: Data Hazards
Instruction Execution Cycle
Pipeline control unit (highly abstracted)
Pipeline Control unit (highly abstracted)
Introduction to Computer Organization and Architecture
Guest Lecturer: Justin Hsia
Presentation transcript:

ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 6 Pipelining – Part 1 Benjamin Lee Electrical and Computer Engineering Duke University

ECE 552 / CPS 5502 ECE552 Administrivia 27 September – Homework #2 Due Assignment on web page. Teams of 2-3. Submit soft copies to Sakai. Use Piazza for questions. 2 October – Class Discussion Roughly one reading per class. Do not wait until the day before! 1.Srinivasan et al. “Optimizing pipelines for power and performance” 2.Mahlke et al. “A comparison of full and partial predicated execution support for ILP processors” 3.Palacharla et al. “Complexity-effective superscalar processors” 4.Yeh et al. “Two-level adaptive training branch prediction”

ECE 552 / CPS 5503 Pipelining Latency = (Instructions / Program) x (Cycles / Instruction) x (Seconds / Cycle) Performance Enhancement - Increases number of cycles per instruction - Reduces number of seconds per cycle Instruction-Level Parallelism - Begin with multi-cycle design - When one instruction advances from stage-1 to stage=2, allow next instruction to enter stage-1. - Individual instructions require the same number of stages - Multiple instructions in-flight, entering and leaving at faster rate insn0.decinsn0.fetch insn1.decinsn1.fetch Multi-cycle Pipelined insn0.exec insn1.exec insn0.decinsn0.fetch insn1.decinsn1.fetch insn0.exec insn1.exec

ECE 552 / CPS 5504 Ideal Pipelining -All objects go through the same stages -No resources shared between any two stages -Equal propagation delay through all pipeline stages -An object entering the pipeline is not affected by objects in other stages -These conditions generally hold for industrial assembly lines -But can an instruction pipeline satisfy the last condition? Technology Assumptions -Small, very fast memory (caches) backed by large, slower memory -Multi-ported register file, which is slower than a single-ported one -Consider 5-stage pipelined Harvard architecture stage 1 stage 2 stage 3 stage 4

ECE 552 / CPS 5505 Practical Pipelining Pipeline Overheads -Each stage requires registers, which hold state/data communicated from one stage to next, incurring hardware and delay overheads -Each stage requires partitioning logic into “equal” lengths -Introduces diminishing marginal returns from deeper pipelines Pipeline Hazards -Instructions do not execute independently -Instructions entering the pipeline depend on in-flight instructions or contend for shared hardware resources stage 1 stage 2 stage 3 stage 4

ECE 552 / CPS 5506 Pipelining MIPS First, build MIPS without pipelining - Single-cycle MIPS datapath Then, pipeline into multiple stages - Multi-cycle MIPS datapath - Add pipeline registers to separate logic into stages - MIPS partitions into 5 stages - 1: Instruction Fetch (IF) - 2: Instruction Decode (ID) - 3: Execute (EX) - 4: Memory (MEM ) - 5: Write Back (WB)

ECE 552 / CPS Stage Pipelined Datapath (MIPS) IF: IR  mem[PC]; PC  PC + 4; ID: A  Reg[IR rs ]; B  Reg[IR rt ]; IF/ID ID/EX EX/MEMMEM/WB

ECE 552 / CPS Stage Pipelined Datapath (MIPS) EX: Result  A op IRop B; MEM: WB  Result; WB: Reg[IR rd ]  WB IF/ID ID/EX EX/MEMMEM/WB

ECE 552 / CPS 5509 Visualizing the Pipeline

ECE 552 / CPS Hazards and Limits to Pipelining Hazards prevent next instruction from executing during its designated clock cycle Structural Hazards -Hardware cannot support this combination of instructions. -Example: Limited resources required by multiple instructions (e.g. FPU) Data Hazards -Instruction depends on result of prior instruction still in pipeline -Example: An integer operation is waiting for value loaded from memory Control Hazards -Instruction fetch depends on decision about control flow -Example: Branches and jumps change PC

ECE 552 / CPS Structural Hazards A single memory port causes structural hazard during data load, instr fetch

ECE 552 / CPS Structural Hazards Stall the pipeline, creating bubbles, by freezing earlier stages  interlocks Use Harvard Architecture (separate instruction, data memories)

ECE 552 / CPS Data Hazards Instruction depends on result of prior instruction still in pipeline

ECE 552 / CPS Data Hazards Read After Write (RAW) -Caused by a dependence, need for communication -Instr-j tries to read operand before Instr-I writes it i: add r1, r2, r3 j: sub r4, r1, 43 Write After Read (WAR) -Caused by an anti-dependence and the re-use of the name “r1” -Instr-j tries to write operand (r1) before Instr-I reads it i: add r4, r1, r3 j: add r1, r2, r3 k: mul r6, r1, r7 Write After Write (WAW) -Caused by an output dependence and the re-use of the name “r1” -Instr-j tries to write operand (r1) before Instr-I writes it i: sub r1, r4, r3 j: add r1, r2, r3 k: mul r6, r1, r7

ECE 552 / CPS Resolving Data Hazards Strategy 1 – Interlocks and Pipeline Stalls -Later stages provide dependence information to earlier stages, which can stall or kill instructions -Works as long as instruction at stage i+1 can complete without any interference from instructions in stages 1 through i (otherwise, deadlocks may occur) FB 1 stage 1 stage 2 stage 3 stage 4 FB 2 FB 3 FB 4

ECE 552 / CPS Interlocks & Pipeline Stalls stalled stages time t0t1t2t3t4t5t6t7.... IFI 1 I 2 I 3 I 3 I 3 I 3 I 4 I 5 IDI 1 I 2 I 2 I 2 I 2 I 3 I 4 I 5 EX I 1 nopnopnopI 2 I 3 I 4 I 5 MA I 1 nopnopnopI 2 I 3 I 4 I 5 WB I 1 nopnopnopI 2 I 3 I 4 I 5 time t0t1t2t3t4t5t6t7.... (I 1 ) r1  (r0) + 10IF 1 ID 1 EX 1 MA 1 WB 1 (I 2 ) r4  (r1) + 17IF 2 ID 2 ID 2 ID 2 ID 2 EX 2 MA 2 WB 2 (I 3 )IF 3 IF 3 IF 3 IF 3 ID 3 EX 3 MA 3 WB 3 (I 4 ) IF 4 ID 4 EX 4 MA 4 WB 4 (I 5 ) IF 5 ID 5 EX 5 MA 5 WB 5 Resource Usage

ECE 552 / CPS Interlocks & Pipeline Stalls IR 31 PC A B Y R MD1 MD2 addr inst Inst Memory 0x4 Add IR Imm Ext ALU rd1 GPRs rs1 rs2 ws wd rd2 we wdata addr wdata rdata Data Memory we nop Example Dependence r1  r r4  r Stall Condition

ECE 552 / CPS Interlock Control Logic -Compare the source registers of instruction in decode stage with the destination registers of uncommitted instructions -Stall if a source register in decode matches some destination register? -No, not every instruction writes to a register -No, not every instruction reads from a register -Derive stall signal from conditions in the pipeline

ECE 552 / CPS Interlock Control Logic IR 31 PC A B Y R MD1 MD2 addr inst Inst Memory 0x4 Add IR Imm Ext ALU rd1 GPRs rs1 rs2 ws wd rd2 we wdata addr wdata rdata Data Memory we nop Compare the source registers of the instruction in the decode stage (rs, rt) with the destination register of the uncommitted instructions (ws). stall C stall ws rs rt ?

ECE 552 / CPS Interlock Control Logic Should we always stall if RS/RT matches some WS? No, because not every instruction writes/reads a register. Introduce write/read enable signals (we/re) C dest IR PC A B Y R MD1 MD2 addr inst Inst Memory 0x4 Add IR Imm Ext ALU rd1 GPRs rs1 rs2 ws wd rd2 we wdata addr wdata rdata Data Memory we 31 nop stall C stall ws rs rt ? we re1re2 C re wswe ws C dest we

ECE 552 / CPS Source and Destination Registers instructionsource(s) destination ALUrd  (rs) func (rt)rs, rtrd ALUirt  (rs) op immrsrt LWrt  M[(rs) + imm]rsrt SWM [(rs) + imm]  (rt)rs, rt BZcond (rs) true:PC  (PC) + immrs false: PC  (PC) + 4rs JPC  (PC) + imm JALr31  (PC), PC  (PC) + immR31 JRPC  (rs)rs JALRr31  (PC), PC  (rs)rsR31 R-type: op rs rt rd func I-type: op rs rt immediate16 J-type: op immediate26

ECE 552 / CPS Interlock Control Logic Should we always stall if RS/RT matches some RD? No, because not every instruction writes/reads a register. Introduce write/read enable signals (we/re) C dest IR PC A B Y R MD1 MD2 addr inst Inst Memory 0x4 Add IR Imm Ext ALU rd1 GPRs rs1 rs2 ws wd rd2 we wdata addr wdata rdata Data Memory we 31 nop stall C stall ws rs rt ? we re1re2 C re wswe ws C dest we

ECE 552 / CPS Deriving the Stall Signal CdestwsCase(opcode) ALU: ws  rd ALUi: ws  rt JAL, JALR: ws  R31 weCase(opcode) ALU, ALUi, LW we  (ws != 0) JAL, JALRwe  1 otherwisewe  0 Crere1Case(opcode) ALU, ALUire1  1 LW, SW, BZre1  1 JR, JALRre1  1 J, JALre1  0 re2Case(opcode) >

ECE 552 / CPS Deriving the Stall Signal Notation: [pipeline-stage][signal] E.g., Drs – rs signal from decode stage E.g., Ewe – we signal from execute stage Cstallstall-1  ((Drs == Ews) & Ewe | (Drs == Mws) & Mwe | (Drs == Wws) & Wwe ) & Dre1 stall-2  ((Drt == Ews) & Ewe | (Drt == Mws) & Mwe | (Drt == Wws) & Wwe ) & Dre2 stall  stall-1 | stall-2

ECE 552 / CPS Load/Store Data Hazards M[(r1)+7]  (r2) r4  M[(r3)+5] What is the problem here? What if (r1)+7 == (r3)+5? Load/Store hazards may be resolved in the pipeline or may be resolved in the memory system. More later.

ECE 552 / CPS Resolving Data Hazards Strategy 2 – Forwarding (aka Bypasses) -Route data as soon as possible to earlier stages in the pipeline -Example: forward ALU output to its input t0t1t2t3t4t5t6t7.... (I 1 ) r1  r0 + 10IF 1 ID 1 EX 1 MA 1 WB 1 (I 2 ) r4  r1 + 17IF 2 ID 2 ID 2 ID 2 ID 2 EX 2 MA 2 WB 2 (I 3 )IF 3 IF 3 IF 3 IF 3 ID 3 EX 3 MA 3 (I 4 ) stalled stagesIF 4 ID 4 EX 4 (I 5 ) IF 5 ID 5 timet0t1t2t3t4t5t6t7.... (I 1 ) r1  r0 + 10IF 1 ID 1 EX 1 MA 1 WB 1 (I 2 ) r4  r1 + 17IF 2 ID 2 EX 2 MA 2 WB 2 (I 3 )IF 3 ID 3 EX 3 MA 3 WB 3 (I 4 ) IF 4 ID 4 EX 4 MA 4 WB 4 (I 5 ) IF 5 ID 5 EX 5 MA 5 WB 5

ECE 552 / CPS Example Forwarding Path ASrc IR PC A B Y R MD1 MD2 addr inst Inst Memory 0x4 Add IR Imm Ext ALU rd1 GPRs rs1 rs2 ws wd rd2 we wdata addr wdata rdata Data Memory we 31 nop stall D EMW

ECE 552 / CPS Deriving Forwarding Signals This forwarding path only applies to the ALU operations… EforwardCase(Eopcode) ALU, ALUiEforward  (ws != 0) otherwiseEforward  0 …and all other operations will need to stall as before EstallCase(Eopcode) LWEstall  (ws != 0) JAL, JALREstall  1 otherwiseEstall  0 Asrc  (Drs == Ews) & Dre1 & Eforward Remember to update stall signal, removing case covered by this forwarding path

ECE 552 / CPS Multiple Forwarding Paths

ECE 552 / CPS Multiple Forwarding Paths ASrc IR PC A B Y R MD1 MD2 addr inst Inst Memory 0x4 Add IR ALU Imm Ext rd1 GPRs rs1 rs2 ws wd rd2 we wdata addr wdata rdata Data Memory we 31 nop stall D EMW PC for JAL,... BSrc

ECE 552 / CPS Forwarding Hardware

ECE 552 / CPS Forwarding Loads/Stores

ECE 552 / CPS Data Hazard Despite Forwarding LD cannot forward (backwards in time) to DSUB. What is the solution?

ECE 552 / CPS Data Hazards and Scheduling Try producing faster code for -A = B + C; D = E – F; -Assume A, B, C, D, E, and F are in memory -Assume pipelined processor Slow Code LW Rb, b LW Rc, c ADD Ra, Rb, Rc SW a, Ra LW Re e LW Rf, f SUBRd, Re, Rf SW d, RD Fast Code LW Rb, b LW Rc, c LW Re, e ADD Ra, Rb, Rc LWRf, f SWa, Ra SUB Rd, Re, Rf SW d, RD

ECE 552 / CPS Acknowledgements These slides contain material developed and copyright by -Arvind (MIT) -Krste Asanovic (MIT/UCB) -Joel Emer (Intel/MIT) -James Hoe (CMU) -John Kubiatowicz (UCB) -Alvin Lebeck (Duke) -David Patterson (UCB) -Daniel Sorin (Duke)