CMPE 421 Parallel Computer Architecture

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

COMP381 by M. Hamdi 1 (Recap) Pipeline Hazards. COMP381 by M. Hamdi 2 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11.
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Intro to Computer Org. Pipelining, Part 2 – Data hazards + Stalls.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Chapter Six 1.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
Review: MIPS Pipeline Data and Control Paths
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
Chapter Six Enhancing Performance with Pipelining
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Appendix A Pipelining: Basic and Intermediate Concepts
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
CS3350B Computer Architecture Winter 2015 Lecture 6.2: Instructional Level Parallelism: Hazards and Resolutions Marc Moreno Maza
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Pipelining (I). Pipelining Example  Laundry Example  Four students have one load of clothes each to wash, dry, fold, and put away  Washer takes 30.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
CS203 – Advanced Computer Architecture Pipelining Review.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Computer Organization
Pipelining: Hazards Ver. Jan 14, 2014
Pipelining Chapter 6.
Single Clock Datapath With Control
Pipeline Implementation (4.6)
ECE232: Hardware Organization and Design
Chapter 4 The Processor Part 3
Chapter 6 Enhancing Performance with Pipelining
Morgan Kaufmann Publishers The Processor
Pipelining review.
Pipelining Chapter 6.
Pipelining in more detail
Chapter Six.
The Processor Lecture 3.6: Control Hazards
The Processor Lecture 3.5: Data Hazards
CS203 – Advanced Computer Architecture
Introduction to Computer Organization and Architecture
Pipelining Hazards.
Presentation transcript:

CMPE 421 Parallel Computer Architecture Part 1 Pipeline: HAZARD

Pipelining MIPS Lets us examine why the pipeline can not run at full speed There are some cases, though, where the next instruction can not begin executing immediately This limits to pipeline are known as hazards What makes it hard? structural hazards: different instructions, at different stages, in the pipeline want to use the same hardware resource (resource conflict) control hazards: succeeding instruction, to put into pipeline, depends on the outcome of a previous branch instruction, already in pipeline Control decision determines execution path, such as when the instruction changes the PC data hazards: an instruction in the pipeline requires data to be computed by a previous instruction still in the pipeline Before actually building the pipelined datapath and control we first briefly examine these potential hazards individually…

Structural Hazards Structural hazard: inadequate hardware to simultaneously support all instructions in the pipeline in the same clock cycle E.g., suppose single – not separate – instruction and data memory in pipeline below with one read port then a structural hazard between first and fourth lw instructions MIPS was designed to be pipelined: structural hazards are easy to avoid! 2 4 6 8 1 I n s t r u c i o f e h R g A L U D a T m l w $ , ( ) 3 P x d Pipelined Structural Hazards Hazard if single memory

Structural Hazard Ex 1: Suppose we have one memory unit instead of separate instruction and data memory Inst Fetch Reg Read ALU Data Access Reg Write When a load or store word instruction is used the MEM stage tries to access the memory and because of single data memory a conflict occurs

Structural Hazard Consider a load followed immediately by a store Processor only has a single write port Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 IF RF/ID EX WB R-type WBB MEMM Load bubble

Structural Hazard Solutions Delay instruction until functional unit is ready Hardware inserts a pipeline stall or a bubble that delays execution of all instructions that follow (previous instructions continue) Increases CPI from the ideal value of 1 Build more sophisticated functional units so that all combinations of instructions can be accommodated Example: Allow two simultaneous writes to the register file

Structural Hazard Solution Write Back Stall Solution: Delay R-type register write by one cycle IF RF/ID EX WB R-type MEM 1 2 3 4 Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 IF RF/ID MEM WB R-type EX Load

Control Hazards Control hazard: need to make a decision based on the result of a previous instruction still executing in pipeline Solution 1 Stall the pipeline P r o g r a m e x e c u t i o n 2 4 6 8 1 1 2 1 4 1 6 o r d e r T i m e ( i n i n s t r u c t i o n s ) I n s t r u c t i o n D a t a Note that branch outcome is computed in ID stage with added hardware (later…) a d d $ 4 , $ 5 , $ 6 R e g A L U R e g f e t c h a c c e s s I n s t r u c t i o n D a t a b e q $ 1 , $ 2 , 4 R e g A L U R e g 2 n s f e t c h a c c e s s I n s t r u c t i o n D a t a l w $ 3 , 3 ( $ ) b u l e R e g A L U R e g f e t c h a c c e s s 4 n s 2 n s Pipeline stall

Control Hazards Solution 2 Predict branch outcome e.g., predict branch-not-taken : Prediction success Prediction failure: undo (=flush) lw

Control Hazards Solution 3 Delayed branch: always execute the sequentially next statement with the branch executing after one instruction delay – compiler’s job to find a statement that can be put in the slot that is independent of branch outcome MIPS does this – but it is an option in SPIM (Simulator -> Settings) P r o g r a m e x e c u t i o n 2 4 6 8 1 1 2 1 4 o r d e r T i m e ( i n i n s t r u c t i o n s ) b e q $ 1 , $ 2 , 4 I n s t r u c t i o n D a t a R e g A L U R e g f e t c h a c c e s s a d d $ 4 , $ 5 , $ 6 I n s t r u c t i o n D a t a R e g A L U R e g ( d e l a y e d b r a n c h s l o t ) 2 n s f e t c h a c c e s s I n s t r u c t i o n D a t a l w $ 3 , 3 ( $ ) R e g A L U R e g 2 n s f e t c h a c c e s s 2 n s Delayed branch beq is followed by add that is independent of branch outcome

Review: Pipelining Multiple Instructions The Instructions in Figures 6-19, 6-20 and 6-21 were independent None of them used the results calculated by any of the others (register numbers are different)

Review: Pipelining Multiple Instructions

Review: Pipelining Multiple Instructions

Data Hazards Problem with starting next instruction before first is finished dependencies that “go backward in time” are data hazards

Solution to Data Hazards Data hazard: instruction needs data from the result of a previous instruction still executing in pipeline Occur when pipeline changes the order of read/write access to operands so that the order differs from the order seen by sequentially executing instructions Solution1 Forward data if possible… Solution 2 Or change the relative timing of instructions (insert stalls) Instruction pipeline diagram: shade indicates use – left=write, right=read P r o g r a m e x e c u t i o n 2 4 6 8 1 o r d e r T i m e ( i n i n s t r u c t i o n s ) Without forwarding – blue line – data has to go back in time; with forwarding – red line – data is available in time a d d $ s , $ t , $ t 1 I F I D E X M E M W B s u b $ t 2 , $ s , $ t 3 I F I D E X M M E E M M W B Caused by several different types of dependencies

Data Hazards SOLUTION 1 Don’t wait for the instruction to complete before trying to resolve the data hazard As soon as ALU creates the sum for “add”, we can supply it as an input for the add Adding extra H/W to retrieve the missing item early from the internal resources is called forwarding or bypassing Invalid Remark: Forwarding path from the output of the memory access stage in the first instruction to the input of the execution stage is invalid (backward in time)

Data Dependency Types -Three classifications of data dependencies for instruction j following instruction I Read after Write (RAW) Instr. j tries to read before instr. i tries to write it Write after Write (WAW) Instr. j tries to write an operand before i writes its value Since register writes only occur in WB, the pipeline we have been discussing does not have this type of dependency Write after Read (WAR) Instr. j tries to write a destination before it is read by i This also does not occur in this pipeline we have been discussing since all reads happen early in the ID/RF stage and all writes are late in the WB stage -WAW and WAR are in later more complicated pipes

Data Hazards Forwarding may not be enough (Hybrid solution is required) e.g., if an R-type instruction following a load uses the result of the load – called load-use data hazard 2 4 6 8 1 1 2 1 4 P r o g r a m T i m e e x e c u t i o n o r d e r ( i n i n s t r u c t i o n s ) Without a stall it is impossible to provide input to the sub instruction in time l w $ s , 2 ( $ t 1 ) I F I D E X M E M W B I F D W B M E X s u b $ t 2 , $ s , $ t 3 -With a one-stage stall (solution 2) -Forwarding can get the data to the sub instruction in time (solution 1)

Reordering Code to Avoid Pipeline Stall (Alternative Software Solution) Example: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) Reordered code: Data hazard Interchanged

Revisiting Hazards So far our datapath and control have ignored hazards We shall revisit data hazards and control hazards and enhance our datapath and control to handle them in hardware…

Data Hazards and Forwarding Problem with starting an instruction before previous are finished: data dependencies that go backward in time – called data hazards $2 = 10 before sub; $2 = -20 after sub sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2)

Software Solution Have compiler guarantee never any data hazards! nop by rearranging instructions to insert independent instructions between instructions that would otherwise have a data hazard between them, or, if such rearrangement is not possible, insert nops Such compiler solutions may not always be possible, and nops slow the machine down sub $2, $1, $3 nop nop and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) sub $2, $1, $3 lw $10, 40($3) slt $5, $6, $7 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) or MIPS: nop = “no operation” = 00…0 (32bits) = sll $0, $0, 0

REVIEW: Solution to HAZARDS

How About Register File Access? Time (clock cycles) Fix register file access hazard by doing reads in the second half of the cycle and writes in the first half ALU IM Reg DM add $1, I n s t r. O r d e ALU IM Reg DM Inst 1 ALU IM Reg DM Inst 2 Define register reads to occur in the second half of the cycle and register writes in the first half ALU IM Reg DM add $2,$1, For lecture clock edge that controls loading of pipeline state registers clock edge that controls register writing

Register Usage Can Cause Data Hazards Dependencies backward in time cause hazards ALU IM Reg DM add $1, I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$1,$7 ALU IM Reg DM or $8,$1,$9 For class handout ALU IM Reg DM xor $4,$1,$5 Read before write data hazard

Register Usage Can Cause Data Hazards Dependencies backward in time cause hazards ALU IM Reg DM add $1, ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$1,$7 ALU IM Reg DM or $8,$1,$9 For lecture ALU IM Reg DM xor $4,$1,$5 Read before write data hazard

Loads Can Cause Data Hazards Dependencies backward in time cause hazards ALU IM Reg DM lw $1,4($2) I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$1,$7 ALU IM Reg DM or $8,$1,$9 ALU IM Reg DM xor $4,$1,$5 Load-use data hazard

One Way to “Fix” a Data Hazard Can fix data hazard by waiting – stall – but impacts CPI ALU IM Reg DM add $1, I n s t r. O r d e stall stall sub $4,$1,$5 and $6,$1,$7 ALU IM Reg DM

Another Way to “Fix” a Data Hazard Fix data hazards by forwarding results as soon as they are available to where they are needed ALU IM Reg DM add $1, I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$1,$7 ALU IM Reg DM or $8,$1,$9 For class handout ALU IM Reg DM xor $4,$1,$5

Another Way to “Fix” a Data Hazard Fix data hazards by forwarding results as soon as they are available to where they are needed ALU IM Reg DM add $1, I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$1,$7 ALU IM Reg DM or $8,$1,$9 ALU IM Reg DM xor $4,$1,$5 Forwarding paths are valid only if the destination stage is later in time than the source stage. Forwarding is harder if there are multiple results to forward per instruction or if they need to write a result early in the pipeline

Forwarding with Load-use Data Hazards ALU IM Reg DM lw $1,4($2) I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$1,$7 ALU IM Reg DM or $8,$1,$9 ALU IM Reg DM xor $4,$1,$5

Forwarding with Load-use Data Hazards ALU IM Reg DM lw $1,4($2) I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$1,$7 ALU IM Reg DM or $8,$1,$9 ALU IM Reg DM xor $4,$1,$5 Will still need one stall cycle even with forwarding

Branch Instructions Cause Control Hazards Dependencies backward in time cause hazards beq ALU IM Reg DM I n s t r. O r d e ALU IM Reg DM lw ALU IM Reg DM Inst 3 ALU IM Reg DM Inst 4

One Way to “Fix” a Control Hazard Another “solution” is to put in enough extra hardware so that we can test registers, calculate the branch address, and update the PC during the second stage of the pipeline. That would reduce the number of stalls to only one. A third approach is to prediction to handle branches, e.g., always predict that branches will be untaken. When right, the pipeline proceeds at full speed. When wrong, have to stall (and make sure nothing completes – changes machine state – that shouldn’t have).

One Way to “Fix” a Control Hazard Fix branch hazard by waiting – stall – but affects CPI ALU IM Reg DM beq I n s t r. O r d e stall stall stall lw ALU IM Reg DM Inst 3