CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipeline Computer Architecture Lecture 12: Designing a Pipeline Processor.
1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in Z3 used mechanical relays and the program was on a punched tape. It used.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Savio Chau Single Cycle Controller Design Last Time: Discussed the Designing of a Single Cycle Datapath Control Datapath Memory Processor (CPU) Input Output.
ECE 361 Computer Architecture Lecture 13: Designing a Pipeline Processor Start X:40.
Chapter 5 Pipelining and Hazards
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.
Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Ceg3420 L1 4.1 DAP Fa97,  U.CB CEG3420 Computer Design Introduction to Pipelining.
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Pipelining - II Rabi Mahapatra Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor.
1 第三章 Instruction-Level Parallelism and Its Dynamic Exploitation 陈文智 浙江大学计算机学院 2011 年 09 月.
Pipelining. 10/19/ Outline 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion.
CPE 731 Advanced Computer Architecture Pipelining Review Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of California,
B 0000 Pipelining ENGR xD52 Eric VanWyk Fall
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
Computer Organization CS224 Chapter 4 Part b The Processor Spring 2010 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]

Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
Pipelining CS365 Lecture 9. D. Barbara Pipeline CS465 2 Outline  Today’s topic  Pipelining is an implementation technique in which multiple instructions.
CS252/Patterson Lec 1.1 1/17/01 معماري کامپيوتر - درس نهم pipeline برگرفته از درس : Prof. David A. Patterson.
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Single Cycle Controller Design
EEL-4713 Ann Gordon-Ross 1 EEL-4713C Computer Architecture Designing a Pipelined Processor.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
EEL4713C Ann Gordon-Ross.1 EEL-4713C Computer Architecture Pipelined Processor - Hazards.
Computer Organization
Stalling delays the entire pipeline
Review: Instruction Set Evolution
Lecture 4: Pipeline Complications: Data and Control Hazards
Pipelining: Hazards Ver. Jan 14, 2014
CDA 3101 Spring 2016 Introduction to Computer Organization
5 Steps of MIPS Datapath Figure A.2, Page A-8
Appendix A - Pipelining
CpE 442 Designing a Pipeline Processor (lect. II)
Chapter 3: Pipelining 순천향대학교 컴퓨터학부 이 상 정 Adapted from
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
CMSC 611: Advanced Computer Architecture
The Processor Lecture 3.6: Control Hazards
Overview What are pipeline hazards? Types of hazards
Introduction to Computer Organization and Architecture
Throughput = #instructions per unit time (seconds/cycles etc.)
Presentation transcript:

CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.2 Introduction to Computer Architecture Outline of Today’s Lecture °Recap and Introduction (5 minutes) °Introduction to Hazards (15 minutes) °Forwarding (25 minutes) °1 cycle Load Delay (5 minutes) °1 cycle Branch Delay (15 minutes) °What makes pipelining hard °Summary (5 minutes)

CPE 442 hazards.3 Introduction to Computer Architecture Review: Single Cycle, Multiple Cycle, vs. Pipeline Clk Cycle 1 Multiple Cycle Implementation: IfetchRegExecMemWr Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 LoadIfetchRegExecMemWr IfetchRegExecMem LoadStore Pipeline Implementation: IfetchRegExecMemWrStore Clk Single Cycle Implementation: LoadStoreWaste Ifetch R-type IfetchRegExecMemWrR-type Cycle 1Cycle 2

CPE 442 hazards.4 Introduction to Computer Architecture Review: A Pipelined Datapath IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register PC Data Me m WA Di RADo IUnit A I RFile Di Ra Rb Rw MemWr RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mux 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 Clk IfetchReg/DecExecMemWr

CPE 442 hazards.5 Introduction to Computer Architecture Review: Pipeline Control “Data Stationary Control” °The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register Reg/DecExecMem ExtOp ALUOp RegDst ALUSrc Branch MemWr MemtoReg RegWr Main Control ExtOp ALUOp RegDst ALUSrc MemtoReg RegWr MemtoReg RegWr MemtoReg RegWr Branch MemWr Branch MemWr Wr

CPE 442 hazards.6 Introduction to Computer Architecture Review: Pipeline Summary °Pipeline Processor: Natural enhancement of the multiple clock cycle processor Each functional unit can only be used once per instruction If a instruction is going to use a functional unit: -it must use it at the same stage as all other instructions Pipeline Control: -Each stage’s control signal depends ONLY on the instruction that is currently in that stage

CPE 442 hazards.7 Introduction to Computer Architecture Outline of Today’s Lecture °Recap and Introduction (5 minutes) °Introduction to Hazards °Forwarding (25 minutes) °1 cycle Load Delay (5 minutes) °1 cycle Branch Delay (15 minutes) °What makes pipelining hard °Summary (5 minutes)

CPE 442 hazards.8 Introduction to Computer Architecture Introduction to Hazards °Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle structural hazards: HW cannot support this combination of instructions data hazards: instruction depends on result of prior instruction still in the pipeline control hazards: pipelining of branches & other instructionsCommon solution is to stall the pipeline until the hazardbubbles” in the pipeline

CPE 442 hazards.9 Introduction to Computer Architecture Mem A Single Memory is a Structural Hazard I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg ALU Mem Reg MemReg

CPE 442 hazards.10 Introduction to Computer Architecture Option 1: Stall to resolve Memory Structural Hazard I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3(stall) Instr 4 ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg bubble ALU Mem Reg MemReg ALU Mem Reg MemReg

CPE 442 hazards.11 Introduction to Computer Architecture Option 2: Duplicate to Resolve Structural Hazard I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg DmReg Separate Instruction Cache (Im) & Data Cache (Dm)

CPE 442 hazards.12 Introduction to Computer Architecture Data Hazard on r1 add r1,r2,r3 sub r4, r1,r3 and r6, r1,r7 or r8, r1,r9 xor r10, r1,r11

CPE 442 hazards.13 Introduction to Computer Architecture Data Hazard on r1: (Figure 6.30, page 397, P&H) I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg DmReg Dependencies backwards in time are hazards

CPE 442 hazards.14 Introduction to Computer Architecture sub r4, r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Option1: HW Stalls to Resolve Data Hazard I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg Dependencies backwards in time are hazards ALU Im Reg Dm Im bubble ALU Reg Dm Reg ALU Im Reg Im Reg

CPE 442 hazards.15 Introduction to Computer Architecture But recall use of “Data Stationary Control” °The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register Reg/DecExecMem ExtOp ALUOp RegDst ALUSrc Branch MemWr MemtoReg RegWr Main Control ExtOp ALUOp RegDst ALUSrc MemtoReg RegWr MemtoReg RegWr MemtoReg RegWr Branch MemWr Branch MemWr Wr

CPE 442 hazards.16 Introduction to Computer Architecture Option 1: How HW really stalls pipeline I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg HW doesn’t change PC => keeps fetching same instruction & sets control signals to to benign values (0) stall ALU Im Reg Dm bubble Im bubble Im bubble Im

CPE 442 hazards.17 Introduction to Computer Architecture Option 2: SW inserts indepdendent instructions I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg Worst case inserts NOP instructions nop ALU Im Reg Dm ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg

CPE 442 hazards.18 Introduction to Computer Architecture Outline of Today’s Lecture °Recap and Introduction (5 minutes) °Introduction to Hazards (15 minutes) °Forwarding °1 cycle Load Delay (5 minutes) °1 cycle Branch Delay (15 minutes) °What makes pipelining hard °Summary (5 minutes)

CPE 442 hazards.19 Introduction to Computer Architecture Option 3 Insight: Data is available! (Figure 6.35, page 415, P&H) I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg DmReg Pipeline registers already contain needed data

CPE 442 hazards.20 Introduction to Computer Architecture HW Change for “Forwarding” (Bypassing): Increase multiplexors to add paths from pipeline registers Assumes register read during write gets new value (otherwise more results to be forwarded)

CPE 442 hazards.21 Introduction to Computer Architecture Complete data Path with Hazard detection and Forwarding Figure 6.41 in the text

CPE 442 hazards.22 Introduction to Computer Architecture Outline of Today’s Lecture °Recap and Introduction (5 minutes) °Introduction to Hazards (15 minutes) °Forwarding (25 minutes) °1 cycle Load Delay °1 cycle Branch Delay (15 minutes) °What makes pipelining hard °Summary (5 minutes)

CPE 442 hazards.23 Introduction to Computer Architecture From Last Lecture: The Delay Load Phenomenon °Although Load is fetched during Cycle 1: The data is NOT written into the Reg File until the end of Cycle 5 We cannot read this value from the Reg File until Cycle 6 3-instruction delay before the load take effect Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8 IfetchReg/DecExecMemWrI0: Load IfetchReg/DecExecMemWrPlus 1 IfetchReg/DecExecMemWrPlus 2 IfetchReg/DecExecMemWrPlus 3 IfetchReg/DecExecMemWrPlus 4

CPE 442 hazards.24 Introduction to Computer Architecture Forwarding reduces Data Hazard to 1 cycle: (Figure 6.47, page 420 P&H) I n s t r. O r d e r Time (clock cycles) lw r1, 0(r2) sub r4,r1,r6 and r6,r1,r7 or r8,r1,r9 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg

CPE 442 hazards.25 Introduction to Computer Architecture Option1: HW Stalls to Resolve Data Hazard I n s t r. O r d e r Time (clock cycles) lw r1, 0(r2) sub r4,r1,r3 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg “Interlock”: checks for hazard & stalls stall bubble Im and r6,r1,r7 or r8,r1,r9 ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg

CPE 442 hazards.26 Introduction to Computer Architecture Option 2: SW inserts independent instructions I n s t r. O r d e r Time (clock cycles) lw r1, 0(r2) sub r4,r1,r3 IFIF ID/R F EXEX ME M WBWB Worst case inserts NOP instructions MIPS I solution: No HW checking nop and r6,r1,r7 or r8,r1,r9 ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg Dm Reg ALU Im Reg Dm Reg

CPE 442 hazards.27 Introduction to Computer Architecture Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d,e, and f in memory. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,Ra LW Re,e LW Rf,f SUB Rd,Re,Rf SWd,Rd Software Scheduling to Avoid Load Hazards

CPE 442 hazards.28 Introduction to Computer Architecture Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d,e, and f in memory. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,Ra LW Re,e LW Rf,f SUB Rd,Re,Rf SWd,Rd Software Scheduling to Avoid Load Hazards Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SWd,Rd

CPE 442 hazards.29 Introduction to Computer Architecture Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,Ra LW Re,e LW Rf,f SUB Rd,Re,Rf SWd,Rd Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SWd,Rd

CPE 442 hazards.30 Introduction to Computer Architecture Outline of Today’s Lecture °Recap and Introduction (5 minutes) °Introduction to Hazards (15 minutes) °Forwarding (25 minutes) °1 cycle Load Delay (5 minutes) °1 cycle Branch Delay °What makes pipelining hard °Summary (5 minutes)

CPE 442 hazards.31 Introduction to Computer Architecture From Last Lecture: The Delay Branch Phenomenon °Although Beq is fetched during Cycle 4: Target address is NOT written into the PC until the end of Cycle 7 Branch’s target is NOT fetched until Cycle 8 3-instruction delay before the branch take effect Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10Cycle 11 IfetchReg/DecExecMemWr IfetchReg/DecExecMemWr 16: R-type IfetchReg/DecExecMemWr IfetchReg/DecExecMemWr24: R-type 12: Beq (target is 1000) 20: R-type Clk IfetchReg/DecExecMemWr1000: Target of Br

CPE 442 hazards.32 Introduction to Computer Architecture Control Hazard on Branches: 3 stage stall

CPE 442 hazards.33 Introduction to Computer Architecture Branch Stall Impact °If CPI = 1, 30% branch, Stall 3 cycles => new CPI = 1.9! °2 part solution: Determine branch taken or not sooner, AND Compute taken branch address earlier °Solution Option 1: Move Zero test to ID/RF stage Adder to calculate new PC in ID/RF stage 1 clock cycle penalty for branch vs. 3

CPE 442 hazards.34 Introduction to Computer Architecture Option 1: move HW forward to reduce branch delay Data Path before change Memor y Access Write Back Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc.

CPE 442 hazards.35 Introduction to Computer Architecture Branch Delay now 1 clock cycle Data Path after change Memory Access Write Back Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc.

CPE 442 hazards.36 Introduction to Computer Architecture Option 2: No Stalls, Define Branch as Delayed, insert instruction after the branch and allow it to execute always, °Worst case, SW inserts NOP into branch delay if no instruction can be found °Where to get instructions to fill branch delay slot? Before branch instruction, example sw r1,0(r2); beqd r0,r2,T change to, beqd r0,r2,T; sw r1,0(r2) From the target address: only valuable when branch From fall through: only valuable when don’t branch °Compiler effectiveness for single branch delay slot: Fills about 60% of branch delay slots About 80% of instructions executed in branch delay slots useful in computation about 50% (60% x 80%) of slots usefully filled

CPE 442 hazards.37 Introduction to Computer Architecture Complete data Path with Hazard detection and Forwarding Figure 6.41 in the text

CPE 442 hazards.38 Introduction to Computer Architecture Example Text Figure 6.52

CPE 442 hazards.39 Introduction to Computer Architecture Outline of Today’s Lecture °Recap and Introduction (5 minutes) °Introduction to Hazards (15 minutes) °Forwarding (25 minutes) °1 cycle Load Delay (5 minutes) °1 cycle Branch Delay (15 minutes) °What makes pipelining hard °Summary (5 minutes)

CPE 442 hazards.40 Introduction to Computer Architecture When is pipelining hard? °Interrupts: 5 instructions executing in 5 stage pipeline How to stop the pipeline? Restrart? Who caused the interrupt? StageProblem interrupts occurring IFPage fault on instruction fetch; misaligned memory access; memory-protection violation IDUndefined or illegal opcode EXArithmetic interrupt MEMPage fault on data fetch; misaligned memory access; memory-protection violation °Load with data page fault, Add with instruction page fault? °Solution 1: interrupt vector/instruction 2: interrupt ASAP, restart everything incomplete

CPE 442 hazards.41 Introduction to Computer Architecture Data path with Exception Handling, Text Figure 6.55, add a Cause register, an Exception PC, and constant addr. of Exception Handeling routine

CPE 442 hazards.42 Introduction to Computer Architecture Review: Summary of Pipelining Basics °Speed Up Š Pipeline Depth (number of stages); if ideal CPI is 1, then: °Hazards limit performance on computers: structural: need more HW resources data: need forwarding, compiler scheduling control: early evaluation & PC, delayed branch, prediction °Increasing length of pipe increases impact of hazards since pipelining helps instruction bandwidth, not latency °Compilers key to reducing cost of data and control hazards load delay slots branch delay slots °Exceptions, Instruction Set, FP makes pipelining harder °Longer pipelines => Branch prediction, more instruction parallelism?