EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©

2 Going Faster…. (Pentium) Superpipelining –Do less work each cycle –More instructions in-flight Parallelism = # of in-flight instructions (on average) –Much faster clock rate Superscalar –Do more work each cycle –More instructions in-flight Parallelism = # of in-flight instructions (on average) –Slightly slower clock rate Do both!

3 …and Faster? Recall: Superpipeline –Loads: 2 cycle penalty before can use result –Causes pipeline stalls Lost performance –Why? Must wait for load to finish –Avoidance? Can re-order code (compiler) Called static scheduling Consider code on next slide… LD $s1, 0($s2) ADD $t2, $s1,$s1 SUB $t3, $t4,$t5

4 Static Code Scheduling Consider code: LD $s1, 0($s2) ADD $t2, $s1,$s1 SUB $t3, $t4,$t5 Best case with 2 cycle load-use delay –LD gets result from data memory cache –ADD waits 2 cycles, then starts –SUB waits for ADD, then starts

5 Static Code Scheduling Consider code: LD $s1, 0($s2) ADD $t2, $s1,$s1 SUB $t3, $t4,$t5 Better to do SUB first: LD $s1, 0($s2) SUB $t3, $t4,$t5 ADD $t2, $s1,$s1 Compiler does this! –Static scheduling After scheduling, the ADD still stalls for 1 cycle –Should find one more instruction to “move earlier”

6 Code Scheduling What if: –LD looks in data cache, answer not there –Must wait longer Say 10 cycles, to get from main memory But: –Compiler doesn’t know LD delay Will it be 2 or 10 cycles? –Compiler can’t always find 10 instructions between “LD” and “ADD”

7 Dynamic Scheduling New concept –Let the CPU do the scheduling –Dynamic Scheduling or Out-of-Order (OoO) Execution Compiler can do a first pass –Compiler: “Best guess” CPU knows about actual events –eg data is not in cache CPU looks AHEAD for new instructions to execute –Eg, find next instruction with no dependence on LD

8 Dynamic Scheduling: Instruction Queue Procedure –CPU scans ahead of current PC –Looks for first independent instruction Does not depend on result in pipeline –Perhaps result is in pipeline, but can be forwarded Will not cause a stall How to check dependence? –Similar to “Forwarding Unit”! –Check which registers the candidate instruction needs to read –Compare to destination-registers of instructions already executing –If OK, “dispatch” the independent instruction

9 Dynamic Scheduling: Instruction Queue Procedure –CPU scans ahead of current PC –Dispatches next independent instruction How to scan ahead? –Can only fetch one at a time Method –Use an InstructionQueue Fixed size, perhaps 4-entries Always know “next 4 instructions” Can check all 4 in parallel in “I” stage –Find first one that is independent Limitation: can only scan ahead 4 instructions

10 Dynamic Scheduling: Instruction Queue Instruction Queue –Good for steady-state (straight-line code) Always see 4 instructions ahead –Branch instruction? Ok if branch not taken Queue may have to be emptied if branch taken Want to avoid this –Use branch prediction to keep Queue full Hmm, things are getting interesting!

11 Dynamic Scheduling + Branch Prediction = Speculative Execution Usually DS + BP are combined PentiumPro, Pentium II, III, IV, etc Called Speculative Execution ! –Fill InstructionQueue from predicted direction –Scan ahead Find next independent instruction –Dispatch instructions out-of-order If branch mispredicts? –Must cancel everything!

12 Speculative Execution + Superscalar = ? Dynamic Superscalar?

13 Dynamic Superscalar Pipeline Out-of-order dispatch

14 PowerPC 604, Pentium Pro Out-of-order dispatch In-order issue

15 Final Words One last problem… Hazards ! You’ve only seen Read-After-Write (RAW) –True dependence Now you have two new ones: –Write-After-ReadWAR false dependence –Write-After-WriteWAW output dependence

16 Out-of-Order Hazards Write-After-Write –Example: A = 10 A = 13 Which one to keep? 13 of course! Out-of-Order? Write-After-Read –Example: A = 9 B = A + 1 A = 10 What value for B? Out of order? Make sure it’s not 11 !!!

EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©

Similar presentations

Presentation on theme: "EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©

Similar presentations

Presentation on theme: "EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©"— Presentation transcript:

Similar presentations

About project

Feedback