EECS 322 April 10, 2000 EECS 322 Computer Architecture Pipeline Control, Data Hazards and Branch Hazards.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
ELEN 468 Advanced Logic Design
Instruction-Level Parallelism (ILP)
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.7.
Pipeline Hazards CS365 Lecture 10. D. Barbara Pipeline Hazards CS465 2 Review  Pipelined CPU  Overlapped execution of multiple instructions  Each on.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
Review: MIPS Pipeline Data and Control Paths
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.
DLX Instruction Format
Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin.
Appendix A Pipelining: Basic and Intermediate Concepts
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Computer Architecture - A Pipelined Datapath A Pipelined Datapath  Resisters are used to save data between stages. 1/14.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
55:035 Computer Architecture and Organization Lecture 10.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
11/13/2015 8:57 AM 1 of 86 Pipelining Chapter 6. 11/13/2015 8:57 AM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.
EECE 476: Computer Architecture Slide Set #5: Implementing Pipelining Tor Aamodt Slide background: Die photo of the MIPS R2000 (first commercial MIPS microprocessor)
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
1. Convert the RISCEE 1 Architecture into a pipeline Architecture (like Figure 6.30) (showing the number data and control bits). 2. Build the control line.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
CMPE 421 Parallel Computer Architecture Part 3: Hardware Solution: Control Hazard and Prediction.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Exceptions Another form of control hazard Could be caused by…
Stalling delays the entire pipeline
Pipelining Chapter 6.
CSCI206 - Computer Organization & Programming
ELEN 468 Advanced Logic Design
Single Clock Datapath With Control
ECS 154B Computer Architecture II Spring 2009
CDA 3101 Spring 2016 Introduction to Computer Organization
ECE232: Hardware Organization and Design
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
Pipelining review.
Pipelining Chapter 6.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #2
Pipelining in more detail
CSCI206 - Computer Organization & Programming
CSCI206 - Computer Organization & Programming
Data Hazards Data Hazard
The Processor Lecture 3.6: Control Hazards
The Processor Lecture 3.5: Data Hazards
CSC3050 – Computer Architecture
Introduction to Computer Organization and Architecture
Systems Architecture II
Guest Lecturer: Justin Hsia
Pipelining - 1.
Stalls and flushes Last time, we discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.
Presentation transcript:

EECS 322 April 10, 2000 EECS 322 Computer Architecture Pipeline Control, Data Hazards and Branch Hazards

EECS 322 April 10, 2000 Models Single-cycle model (non-overlapping) Each instruction executes in a single cycle Every instruction and clock-cycle must be stretched to the slowest instruction (p.438) Pipeline model (overlapping) Each instruction executes in several cycles The clock-cycle must be stretched to the slowest step Gains efficiency by overlapping the execution of multiple instructions, increasing hardware utilization. (p. 377) Multi-cycle model (non-overlapping) Each instruction executes in variable number of cycles The clock-cycle must be stretched to the slowest step Ability to share functional units within the execution of a single instruction

EECS 322 April 10, 2000 Overhead Single-cycle model 8 ns Clock (125 MHz), (non-overlapping) 1 ALU + 2 adders 0 Muxes 0 Datapath Register bits (Flip-Flops) Multi-cycle model 2 ns Clock (500 MHz), (non-overlapping) 1 ALU + Controller 5 Muxes 160 Datapath Register bits (Flip-Flops) Pipeline model 2 ns Clock (500 MHz), (overlapping) 2 ALU + Controller 4 Muxes 373 Datapath + 16 Controlpath Register bits (Flip-Flops) Chip AreaSpeed

EECS 322 April 10, 2000 Figure 6.25 PC 32 bits PC 32 bits IR 32 bits IR 32 bits PC3 2 A 32 B 32 Si 32 RT 5 RD 5 PC 32 PC 32 Z1Z1 Z1Z1 ALU Out 32 B 32 D5D5 D5D5 M D R 32 M D R 32 ALU Out 32 D5D5 D5D5 Datapath Registers FFs 160 FFs 2 W 3 M 4 EX 2 W 3 M 2 W + 16 FFs

EECS 322 April 10, 2000 Figure 6.29 Pipeline Control: Controlpath Register bits 9 control bits 5 control bits 2 control bits

EECS 322 April 10, 2000 Figure 5.20, Single Cycle Pipeline Control: Controlpath Register bits Instruction R-format lw sw beq Reg Dst 1 1 X X ALU Src Mem Reg 0 1 X X Reg Wrt Mem Red Mem Wrt Bra- nch ALU op ALU op Figure 6.28 Instruction R-format lw sw beq Reg Dst 1 1 X X ALU Op ALU Op ALU Src Bra- nch Mem Red Mem Wrt Reg Wrt Mem Reg 0 1 X X ID / EX control lines EX / MEM control lines MEM / WB cntrl lines

EECS 322 April 10, 2000 Pipeline Hazards Pipeline hazards Solution #1 always works: stall, delay & procrastinate! Structural Hazards (i.e. fetching same memory bank) Solution #2: partition architecture Control Hazards (i.e. branching) Solution #1: stall! but decreases throughput Solution #2: guess and back-track Solution #3: delayed decision: delay branch & fill slot Data Hazards (i.e. register dependencies) Worst case situation Solution #2: re-order instructions Solution #3: forwarding or bypassing: delayed load

EECS 322 April 10, 2000 Figure 6.30 Pipeline Datapath and Controlpath

EECS 322 April 10, 2000 Figure 6.30 load inst. PC=4 IR= lw $10,20($1) Clock 1 PC=4+20<<2 MDR=Mem[20+C$1] D=$10 WB=11 Aluout Clock 3 PC=0 Clock 0 PC=4 A=C$1 B=X S=20 T=$10 D=0 WB=11, M=010 EX=0001 Clock 2 PC=4+20<<2 ALU=20+C$1 D=$10 Clock 3 WB=11 M=010

EECS 322 April 10, 2000 Figure 6.30 load inst. PC=4 IR= lw $10,20($1) Clock 1 PC=4+20<<2 MDR=Mem[20+C$1] D=$10 WB=11 Aluout Clock 3 PC=0 Clock 0 PC=4 A=C$1 B=X S=20 T=$10 D=0 WB=11, M=010 EX=0001 Clock 2 PC=4+20<<2 ALU=20+C$1 D=$10 Clock 3 WB=11 M=010

EECS 322 April 10, 2000 Pipeline Datapath and Controlpath Clock Contents of Register 1 = C$1 = 3; C$2=4; C$3=4; C$4=6; C$5=7; C$10=8; … Memory[23]=9; Formats:add $rd,$rs=A,$rt=B; lw 4 5

EECS 322 April 10, 2000 Clock 1: Figure 6.31a PC=4 IR=lw $10,20($1) PC=0

EECS 322 April 10, 2000 PC=4 IR=lw $10,20($1) Figure 6.31b PC=4 A=C$1 B=X S=20 T=$10 D=0 PC=8 IR=sub $11, $2, $3 PC=4 C C

EECS 322 April 10, 2000 PC=8 IR=sub $11, $2, $3 PC=4 A=C$1 B=X S=20 T=$10 D=0 C C Figure 6.32a PC=8 PC=12 IR=and $12,$4,$5 PC=8 A=C$2 B=C$3 S=X T=$3 D=$11 C C PC=4+20<<2 ALU=20+C$1 D=$10 C C

EECS 322 April 10, 2000 PC=12 IR=and $12,$4,$5 PC=8 A=C$2 B=C$3 S=X T=$3 D=$11 C C PC=4+20<<2 ALU=20+C$1 D=$10 C C Clock 4: Figure 6.32b PC=4+20<<2 MDR=Mem[20+C$1] D=$10 C C ALU PC=X ALU=C$2-C$3 D=$11 C C PC=12 A=C$4 B=C$5 S=X T=$3 D=$12 C C PC=16 IR=or $13,$6,$7 PC=20

EECS 322 April 10, 2000 Figure 6.36 Data Dependencies: that can be resolved by forwarding Data Hazards Resolved by forwarding At same time: Not a hazard Forward in time: Not a hazard Data Dependencies

EECS 322 April 10, 2000 Data Hazards: arithmetic Forwards in time: Can be resolved At same time: Not a hazard Figure 6.37

EECS 322 April 10, 2000 Data Dependencies: no forwarding sub $2,$1,$3 and $12,$2,$5 Clock IF EX 7 7 M 8 8 WB ID IF 2 2 MIPS = Clock = 500 Mhz = 167 MIPS CPI 3 WB 5 5 ID Write 1st Half Read 2nd Half 3 3 EX Stall ID M 4 4 Stall ID Suppose every instruction is dependant = stalls = 3 clocks

EECS 322 April 10, 2000 Data Dependencies: no forwarding MIPS = Clock = 500 Mhz = 417 MIPS(10% dependency) CPI 1.2 A dependant instruction will take= stalls = 3 clocks An independent instruction will take= stalls = 1 clocks Suppose 10% of the time the instructions are dependant? Averge instruction time = 10%*3 + 90%*1 = 0.10* *1 = 1.2 clocks MIPS = Clock = 500 Mhz = 167 MIPS(100% dependency) CPI 3 MIPS = Clock = 500 Mhz = 500 MIPS(0% dependency) CPI 1

EECS 322 April 10, 2000 Data Dependencies: with forwarding sub $2,$1,$3 and $12,$2,$5 Clock IF WB M 5 5 Detected Data Hazard 1a ID/EX.$rs = EX/M.$rd ID 3 3 EX M 4 4 ID IF 2 2 MIPS = Clock = 500 Mhz = 500 MIPS CPI 1 Suppose every instruction is dependant = stalls = 1 clock

EECS 322 April 10, 2000 Data Dependencies: Hazard Conditions ID/EX.$rs ID/EX.$rt EX/MEM.$rdest = { 1a. 1b. Hazard TypeSourceDestination Data Hazard Condition occurs whenever a data source needs a previous unavailable result due to a data destination. Data Hazard Detection is always comparing a destination with a source. Example sub $2, $1, $3sub $rd, $rs, $rt and $12, $2, $5and $rd, $rs, $rt ID/EX.$rs ID/EX.$rt MEM/WB.$rdest = 2a. 2b. {

EECS 322 April 10, 2000 Data Dependencies: Hazard Conditions 1a Data Hazard:EX/MEM.$rd = ID/EX.$rs sub $2, $1, $3sub $rd, $rs, $rt and $12, $2, $5and $rd, $rs, $rt 1b Data Hazard:EX/MEM.$rd = ID/EX.$rt sub $2, $1, $3sub $rd, $rs, $rt and $12, $1, $2and $rd, $rs, $rt 2a Data Hazard:MEM/WB.$rd = ID/EX.$rs sub $2, $1, $3sub $rd, $rs, $rt and $12, $1, $5sub $rd, $rs, $rt or $13, $2, $1and $rd, $rs, $rt 2b Data Hazard:MEM/WB.$rd = ID/EX.$rt sub $2, $1, $3sub $rd, $rs, $rt and $12, $1, $5sub $rd, $rs, $rt or $13, $6, $2and $rd, $rs, $rt

EECS 322 April 10, 2000 Data Dependencies: Worst case Data Hazard: sub $2, $1, $3sub $rd, $rs, $rt and $12, $2, $2and $rd, $rs, $rt or $13, $2, $2and $rd, $rs, $rt Data Hazard 1a: EX/MEM.$rd = ID/EX.$rs Data Hazard 1b: EX/MEM.$rd = ID/EX.$rt Data Hazard 2a: MEM/WB.$rd = ID/EX.$rs Data Hazard 2b: MEM/WB.$rd = ID/EX.$rt

EECS 322 April 10, 2000 Data Dependencies: Hazard Conditions ID/EX.$rs ID/EX.$rt = EX/MEM.$rdest } 1a. 1b. ID/EX.$rs ID/EX.$rt = MEM/WB.$rdest } 2a. 2b. Hazard Type $rs $rt $rd ID/EXEX/MEM SourceDestination MEM/WB $rd Pipeline Registers

EECS 322 April 10, 2000 Figure 6.38

EECS 322 April 10, 2000 Data Hazards: Loads Figure 6.44 Backwards in time: Cannot be resolved Forwards in time: Can be resolved At same time: Not a hazard

EECS 322 April 10, 2000 Data Hazards: load stalling Figure 6.45 Stall

EECS 322 April 10, 2000 Data Hazards: Hazard detection unit (page 490) IF/ID.$rs IF/ID.$rt = ID/EX.$rt  ID/EX.MemRead=1 } SourceDestination Stall Condition Stall Example lw $2, 20($1)lw $rt, addr($rs) and $4, $2, $5and $rd, $rs, $rt No Stall Example: (only need to look at next instruction) lw $2, 20($1)lw $rt, addr($rs) and $4, $1, $5and $rd, $rs, $rt or $8, $2, $6or $rd, $rs, $rt

EECS 322 April 10, 2000 No Stall Example: (only need to look at next instruction) lw $2, 20($1)lw $rt, addr($rs) and $4, $1, $5and $rd, $rs, $rt or $8, $2, $6or $rd, $rs, $rt Data Hazards: Hazard detection unit (page 490) Example load: assume half of the instructions are immediately followed by an instruction that uses it. load instruction time: 50%*(1 clock) + 50%*(2 clocks)=1.5 What is the average number of clocks for the load?

EECS 322 April 10, 2000 Hazard Detection Unit: when to stall Figure 6.46

EECS 322 April 10, 2000 IF/ID.$rs IF/ID.$rt = ID/EX.$rt  ID/EX.MemRead=1 } SourceDestination Stall Condition ID/EX.$rs ID/EX.$rt = EX/MEM.$rd } ID/EX.$rs ID/EX.$rt = MEM/WB.$rd } SourceDestination Forwarding Condition Data Dependency Units

EECS 322 April 10, 2000 $rs $rt $rd ID/EX Pipeline Registers Data Dependency Units $rs $rt $rd IF/ID IF/ID.$rs IF/ID.$rt = ID/EX.$rt  ID/EX.MemRead=1 } SourceDestination Stall Condition $rd EX/MEMMEM/WB $rd Forwarding ComparisonsStalling Comparisons

EECS 322 April 10, 2000 Branch Hazards: Soln #1, Stall until Decision made (fig. 6.4) Decision made in ID stage: do $5, beq $1, $3, and$12, $2, or$13, $6, add$14, $2, lw$4, 50($7) Soln #1: Stall until Decision is made Stall

EECS 322 April 10, 2000 Branch Hazards: Soln #2, Predict until Decision made beq $1,$3,7 and $12, $2, $5 Clock IF EX 7 7 M 8 8 WB ID IF 2 2 WB 5 5 ID 3 3 EXM 4 4 lw $4, 50($7) EXM WB IFID Predict false branch Decision made in ID stage: discard & branch discard “and $12,$2,$5” instruction

EECS 322 April 10, 2000 beq $1,$3,7 add $4,$6,$6 Clock IF EX 7 7 M 8 8 WB ID IF 2 2 WB 5 5 ID 3 3 EXM 4 4 lw $4, 50($7) EXM WB IFID Move instruction before branch Decision made in ID stage: branch Do not need to discard instruction Branch Hazards: Soln #3, Delayed Decision

EECS 322 April 10, 2000 Branch Hazards: Soln #3, Delayed Decision beq $1,$3,7 and $12, $2, $5 Clock IF EX 7 7 M 8 8 WB ID IF 2 2 WB 5 5 ID 3 3 EXM 4 4 lw $4, 50($7) EXM WB IFID Decision made in ID stage: do branch

EECS 322 April 10, 2000 Branch Hazards: Decision made in the ID stage (figure 6.4) beq $1,$3,7 nop Clock IF EX 7 7 M 8 8 WB ID IF 2 2 WB 5 5 ID 3 3 EXM 4 4 lw $4, 50($7) EXM WB IFID No decision yet: insert a nop Decision: do load

EECS 322 April 10, 2000 Branch Hazards: Soln #2, Predict until Decision made Figure 6.50 Branch Decision made in MEM stage: Discard values when wrong prediction Same effect as 3 stalls Predict false branch

EECS 322 April 10, 2000 Figure 6.51 Early branch comparison Flush: if wrong prediciton, add nops

EECS 322 April 10, 2000 Performance load: assume half of the instructions are immediately followed by an instruction that uses it (i.e. data dependency) load instruction time = 50%*(1 clock) + 50%*(2 clocks)=1.5 Branch: the branch delay of misprediction is 1 clock cycle that 25% of the branches are mispredicted. branch time = 75%*(1 clocks) + 25%*(2 clocks) = 1.25 Jump: assume that jumps always pay 1 full clock cycle delay (stall). Jump instruction time = 2

EECS 322 April 10, 2000 InstructionPipeline Cycles Instruction Mix loads arithmetic branches stores % 13% 43% 19% Clock speed 500 Mhz 2 ns CPI1.18 MIPS424 MIPS= Clock/CPI jumps22% Single- Cycle Mhz 8 ns MIPS 1 Multi-Cycle Clocks Mhz 2 ns MIPS 3 =  Cycles*Mix Performance, page 504

EECS 322 April 10, Convert the RISCEE 1 Architecture into a pipeline Architecture (like Figure 6.30) (showing the number data and control bits). 2. Build the control line table (like Figure 6.28) for the RISCEE3 pipeline architecture for RISCEE1 instructions: (addi, subi, load, store, beq, jmp, jal). 3. Homework 6.23, p Homework 6.24, p534 Homework for Wednesday April 5, 2000