Caltech CS184 Spring2003 -- DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 4: April 7, 2003 Instruction Level Parallelism ILP.

Slides:

Advertisements

Similar presentations

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.

Advertisements

1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.

CSCI 4717/5717 Computer Architecture

ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.

Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture VLIW Steve Ko Computer Sciences and Engineering University at Buffalo.

Advanced Pipelining Optimally Scheduling Code Optimally Programming Code Scheduling for Superscalars (6.9) Exceptions (5.6, 6.8)

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.

Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original.

Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.

Instruction-Level Parallelism (ILP)

Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)

CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

Computer Organization and Architecture The CPU Structure.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

Chapter 12 Pipelining Strategies Performance Hazards.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

EENG449b/Savvides Lec /20/04 February 12, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.

Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )

EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.

Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.

1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.

1 Out-Of-Order Execution (part I) Alexander Titov 14 March 2015.

Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.

1 Lecture 5: Dependence Analysis and Superscalar Techniques Overview Instruction dependences, correctness, inst scheduling examples, renaming, speculation,

1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.

Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.

Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day8:

Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day7:

COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.

Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.

Instruction-Level Parallelism and Its Dynamic Exploitation

CS 352H: Computer Systems Architecture

/ Computer Architecture and Design

CS203 – Advanced Computer Architecture

Chapter 14 Instruction Level Parallelism and Superscalar Processors

Appendix C Pipeline implementation

Pipelining: Advanced ILP

Morgan Kaufmann Publishers The Processor

Sequential Execution Semantics

Lecture 6: Advanced Pipelines

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

Lecture 8: Dynamic ILP Topics: out-of-order processors

Adapted from the slides of Prof

15-740/ Computer Architecture Lecture 5: Precise Exceptions

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

How to improve (decrease) CPI

Control unit extension for data hazards

* From AMD 1996 Publication #18522 Revision E

Adapted from the slides of Prof

Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011

Control unit extension for data hazards

Dynamic Hardware Prediction

Control unit extension for data hazards

Loop-Level Parallelism

Lecture 9: Dynamic ILP Topics: out-of-order processors

Conceptual execution on a processor which exploits ILP

Presentation transcript:

Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 4: April 7, 2003 Instruction Level Parallelism ILP 1

Caltech CS184 Spring DeHon 2 Today ILP – beyond 1 instruction per cycle –Parallelism –Dynamic exploitation Scoreboarding Register renaming –Control flow Prediction Reducing

Caltech CS184 Spring DeHon 3 Real Issue Sequential ISA Model adds an artificial constraint to the computational problem. Original problem (real computation) is not sequentially dependent as a long critical path. –Path Length != # of instructions

Caltech CS184 Spring DeHon 4 Dataflow Graph Real problem is a graph

Caltech CS184 Spring DeHon 5 Task Has Parallelism MPY R3,R2,R2 MPY R4,R2,R5 MPY R3,R6,R3 ADD R4,R4,R7 ADD R4,R3,R4

Caltech CS184 Spring DeHon 6 More when pipelined Working on stream (loop) may be able to perform all ops at once –…appropriately staggered in time.

Caltech CS184 Spring DeHon 7 Problem For sequential ISA: –must linearize graph –create false dependencies MPY R3,R2,R2 MPY R4,R2,R5 MPY R3,R6,R3 ADD R4,R4,R7 ADD R4,R3,R4 MPY R3,R2,R2 MPY R3,R6,R3 MPY R4,R2,R5 ADD R4,R4,R7 ADD R4,R3,R4

Caltech CS184 Spring DeHon 8 ILP The original problem had parallelism Can we exploit it? Can we rediscover it after? –linearizing –scheduling –assigning resources

Caltech CS184 Spring DeHon 9 If we can find the parallelism... …and will spend the silicon area can execute multiple instructions simultaneously MPY R3,R2,R2; MPY R4,R2,R5 MPY R3,R6,R3; ADD R4,R4,R7 ADD R4,R3,R4 IF ID EX RF ALU1 ALU2

Caltech CS184 Spring DeHon 10 First Challenge: Multi-issue, maintain depend Like Pipelining Let instructions go if no hazard Detect (potential hazards) –stall for data available

Caltech CS184 Spring DeHon 11 Scoreboarding Easy conceptual model: –Each Register has a valid bit –At issue, read registers –If all registers have valid data mark result register invalid (stale) forward into execute –else stall until all valid –When done write to register set result to valid

Caltech CS184 Spring DeHon 12 Scoreboard MPY R3,R2,R2 MPY R4,R2,R5 MPY R3,R6,R3 ADD R4,R4,R7 ADD R4,R3,R4 2: 1 3: 1 4: 1 5: 1 6: 1 7: 1 IF ID EX RF ALU1 ALU2 R2.valid=1 issue Set R3.valid=0 2: 1 3: 0 4: 1 5: 1 6: 1 7: 1

Caltech CS184 Spring DeHon 13 Scoreboard MPY R3,R2,R2 MPY R4,R2,R5 MPY R3,R6,R3 ADD R4,R4,R7 ADD R4,R3,R4 2: 1 3: 0 4: 1 5: 1 6: 1 7: 1 IF ID EX RF ALU1 ALU2 R2.valid=1 R5.valid=1 issue Set R4.valid=0 2: 1 3: 0 4: 0 5: 1 6: 1 7: 1

Caltech CS184 Spring DeHon 14 Scoreboard MPY R3,R2,R2 MPY R4,R2,R5 MPY R3,R6,R3 ADD R4,R4,R7 ADD R4,R3,R4 2: 1 3: 0 4: 0 5: 1 6: 1 7: 1 IF ID EX RF ALU1 ALU2 R3.valid=0 R6.valid=1 stall

Caltech CS184 Spring DeHon 15 Scoreboard MPY R3,R2,R2 MPY R4,R2,R5 MPY R3,R6,R3 ADD R4,R4,R7 ADD R4,R3,R4 2: 1 3: 0 4: 0 5: 1 6: 1 7: 1 IF ID EX RF ALU1 ALU2 MPY R3 complete Set R3.valid=1 2: 1 3: 1 4: 0 5: 1 6: 1 7: 1

Caltech CS184 Spring DeHon 16 Scoreboard MPY R3,R2,R2 MPY R4,R2,R5 MPY R3,R6,R3 ADD R4,R4,R7 ADD R4,R3,R4 2: 1 3: 1 4: 0 5: 1 6: 1 7: 1 IF ID EX RF ALU1 ALU2 R3.valid=1 R6.valid=1 Set R3.valid=0 2: 1 3: 0 4: 0 5: 1 6: 1 7: 1 issue

Caltech CS184 Spring DeHon 17 Scoreboard Of course, bypass –bypass as we did in pipeline –incorporate into stall checks so can continue as soon as result shows up Also, careful not to issue –when result register invalid (WAW)

Caltech CS184 Spring DeHon 18 Ordering As shown –issue instructions in order –stall on first dependent instruction get head-of-line-blocking Alternative –Out of order issue

Caltech CS184 Spring DeHon 19 Example MPY R3,R2,R2 MPY R4,R2,R5 MPY R3,R6,R3 ADD R4,R4,R7 ADD R4,R3,R4 MPY R3,R2,R2 MPY R3,R6,R3 MPY R4,R2,R5 ADD R4,R4,R7 ADD R4,R3,R4 MPY R3,R2,R2 MPY R4,R2,R5 MPY R3,R6,R3 ADD R4,R4,R7 ADD R4,R3,R4

Caltech CS184 Spring DeHon 20 Example This sequence block on in-order issue –second instruction depend on first But 3rd instruction not depend on first 2. MPY R3,R2,R2 MPY R3,R6,R3 MPY R4,R2,R5 ADD R4,R4,R7 ADD R4,R3,R4

Caltech CS184 Spring DeHon 21 Example Out of Order –look beyond head pointer for enabled instructions –issue and scoreboard next found MPY R3,R2,R2 MPY R3,R6,R3 MPY R4,R2,R5 ADD R4,R4,R7 ADD R4,R3,R4 MPY R3,R6,R3 stalls for R3 to be computed MPY R4,R2,R5 can be issued while R3 waiting

Caltech CS184 Spring DeHon 22 False Sequentialization on Register Names Problem: reuse of small set of register names may introduce false sequentialization ADD R2,R3,R4 SW R2,(R1) ADD R1,1,R1 ADD R2,R5,R6 SW R2,(R1) ADD R2,R3,R4 SW R2,(R1) ADD R1,1,R1 ADD R2,R5,R6 SW R2,(R1)

Caltech CS184 Spring DeHon 23 False Sequentialization Recognize: –register names are just a way of describing local dataflow This says: the result of adding R5 and R6 gets stored into the address pointed to by R1 R2 only describes the dataflow. ADD R2,R3,R4 SW R2,(R1) ADD R1,1,R1 ADD R2,R5,R6 SW R2,(R1)

Caltech CS184 Spring DeHon 24 Renaming Trick: –separate ISA (“architectural”) register names from functional/physical registers –allocate a new register on definitions (compare def-use chains in cs134b?) –keep track of all uses (until next definition) –assign all uses the new register name at issue –use new register name to track dependencies, bypass, scoreboarding...

Caltech CS184 Spring DeHon 25 Example ADD R2,R3,R4 SW R2,(R1) ADD R1,1,R1 ADD R2,R5,R6 SW R2,(R1) Rename Table R1: P2 R2: P6 R3: P7 R4: P8 R5: P9 R6: P10 Free Table: P1 P3 P4 P11

Caltech CS184 Spring DeHon 26 Example ADD R2,R3,R4 SW R2,(R1) ADD R1,1,R1 ADD R2,R5,R6 SW R2,(R1) Rename Table R1: P2 R2: P6 R3: P7 R4: P8 R5: P9 R6: P10 Free Table: P1 P3 P4 P11 Rename Table R1: P2 R2: P1 R3: P7 R4: P8 R5: P9 R6: P10 Free Table: P3 P4 P11 Issue: ADD P1,P7,P8 Allocate P1 for R2

Caltech CS184 Spring DeHon 27 Example ADD R2,R3,R4 SW R2,(R1) ADD R1,1,R1 ADD R2,R5,R6 SW R2,(R1) Rename Table R1: P2 R2: P1 R3: P7 R4: P8 R5: P9 R6: P10 Free Table: P3 P4 P11 Rename Table R1: P2 R2: P1 R3: P7 R4: P8 R5: P9 R6: P10 Free Table: P3 P4 P11 Issue: SW P1,(P2)

Caltech CS184 Spring DeHon 28 Example ADD R2,R3,R4 SW R2,(R1) ADD R1,1,R1 ADD R2,R5,R6 SW R2,(R1) Rename Table R1: P2 R2: P1 R3: P7 R4: P8 R5: P9 R6: P10 Free Table: P3 P4 P11 Rename Table R1: P3 R2: P1 R3: P7 R4: P8 R5: P9 R6: P10 Free Table: P4 P11 Issue: ADD P3,1,P2 Allocate P3 for R1

Caltech CS184 Spring DeHon 29 Example ADD R2,R3,R4 SW R2,(R1) ADD R1,1,R1 ADD R2,R5,R6 SW R2,(R1) Rename Table R1: P3 R2: P1 R3: P7 R4: P8 R5: P9 R6: P10 Free Table: P4 P11 Rename Table R1: P3 R2: P4 R3: P7 R4: P8 R5: P9 R6: P10 Free Table: P11 Issue: ADD P4,P9,P10 Allocate P4 for R2

Caltech CS184 Spring DeHon 30 Example ADD R2,R3,R4 SW R2,(R1) ADD R1,1,R1 ADD R2,R5,R6 SW R2,(R1) Rename Table R1: P3 R2: P4 R3: P7 R4: P8 R5: P9 R6: P10 Free Table: P11 Rename Table R1: P3 R2: P4 R3: P7 R4: P8 R5: P9 R6: P10 Free Table: P11 Issue: SW P4,(P3)

Caltech CS184 Spring DeHon 31 Free Physical Register Free after complete last use Identify last use by next def? –Wait for all users to complete… Or, allocate in order (LRU) –interlock if re-assignment conflict –(should correspond to having no free physical registers)

Caltech CS184 Spring DeHon 32 Tomasulo Register renaming Scoreboarding Bypassing IBM 1967 …what’s keeping x86 ISA alive today –compensate for small number of arch. Registers –dusty deck code

Caltech CS184 Spring DeHon 33 Parallelism and Control Seen can turn a basic block –(code between branches) Into executing dataflow graph –i.e. once issues, only dataflow dependencies limit parallelism …all the more reason to want large basic blocks (minimize branch, branch effects)

Caltech CS184 Spring DeHon 34 Avoiding Control Flow Limits

Caltech CS184 Spring DeHon 35 Control Flow Previously saw data hazards on control force stalls –for multiple cycles With superscalar, may be issuing multiple instructions per cycle Makes stalls even more expensive –wasting n slots per cycle –e.g. with 7 instructions / branch issue 7 instructions, hit branch, stall for instructions to complete...

Caltech CS184 Spring DeHon 36 Control/Branches Cannot afford to stall on branches for resolution Limit parallelism to basic block –average run length between branches …which is typically 5--7

Caltech CS184 Spring DeHon 37 Idea Predict branch direction Execute as if that’s correct Commit/discard results after know branch direction Use ideas from precise exceptions to separate –working values –architecture state

Caltech CS184 Spring DeHon 38 Goal Correctly predicted branches run as if they weren’t there (noops) Maximize the expected run length between mis-predicted branches

Caltech CS184 Spring DeHon 39 Expected Run Length E(l) = L1 + L2*P1+L3*P1*P2+L4*P1*P2*P3 Li=l, Pi=P E(l)=l(1+p+p 2 +p 3 +…) E(l)=l/(1-p) E(l) = 1/(probability of mispredict)

Caltech CS184 Spring DeHon 40 Expected Run Length P=0.910 p= p= p= Halving mispredict rate  doubles run length

Caltech CS184 Spring DeHon 41 IPC Run for E(l) instructions Then mispredict –waste ~ pipeline delay cycles (and all work) Pipe delay: d Base IPC: n E(l)/n cycles issue n d cycles issue nothing useful IPC=E(l)/(E(l)/n+d)=n/(1+dn/E(l))

Caltech CS184 Spring DeHon 42 Example IPC=E(l)/(E(l)/n+d)=n/(1+dn/E(l)) Say E(I)=100, n=7, d=10 7/(1+70/100)=7/1.7  4

Caltech CS184 Spring DeHon 43 Branch Prediction Previous runs (dynamic) History Correlated

Caltech CS184 Spring DeHon 44 Previous Run Hypothesis: branch behavior is largely a characteristic of the program. –Can use data from previous runs (different input data set) –to predict branch behavior Fisher: Instructions/mispredict: –even with different data sets

Caltech CS184 Spring DeHon 45 Data Prediction Example shows value (and validity) of feedback –run program –measure behavior –use to inform compiler, generate better code Static/procedural analysis –often cannot yield enough information –most behavioral properties are undecidable

Caltech CS184 Spring DeHon 46 Branch History Table Hypothesis: we can predict the behavior of a branch based on it’s recent past behavior. –If a branch has been taken, we’ll predict it’s taken this time. To exploit dynamic strength, would like to be responsive to changing program behavior.

Caltech CS184 Spring DeHon 47 Branch History Table Implementation –Saturating counter count up branch taken; down on branch not taken –Predict direction based on majority (which side of mid-point) of past branches Saturation –keeps counter small (cheap) typically 2b –limits amount of history kept time to “learn” new behavior

Caltech CS184 Spring DeHon 48 Correlated Branch Prediction Hypothesis: branch directions are highly correlated –a branch is likely to depend on branches which occurred before it. Implementation –look at last m branches shift register on branch directions –use a separate counter to track each of the 2 m cases –contain cost: only keep a small number of entries and allow aliasing

Caltech CS184 Spring DeHon 49 Branch Prediction …whole host of schemes and variations proposed in literature

Caltech CS184 Spring DeHon 50 Prediction worked for Direction... Note: –have to decode instruction to figure out if it’s a branch –then have to perform arithmetic to get branch target So, don’t know where the branch is going until cycle (or several) after fetch –IF ID EX

Caltech CS184 Spring DeHon 51 Branch-Target Buffer Take it one step back and predict target address Cache –in parallel with Memory Fetch (IF) –stores predicted target PC and branch prediction –tagged with PC to avoid aliasing

Caltech CS184 Spring DeHon 52 Reducing Number of Branching A mispredicted branch costs more than a few cycles in these wide-issue machines –potentially n*d Especially in cases of reconvergent flow and even branch probabilities

Caltech CS184 Spring DeHon 53 Conditional Operations Idea: create guarded operations –only change register if some result holds e.g. –8b saturating add c=a+b if (t1>255) c=255 if (t1<0) c=0 ADD R1,R2,R3 SUB R4,R1,#255 CMOVP R1,#255,R4 COMVN R1,#0,R1

Caltech CS184 Spring DeHon 54 Conditional Operation Prospect For unpredictable branch (p~=0.5) –E(wasted issue slots) = p*n*d (n*d/2) With conditional move –assume l cycles inside conditional clauses –one sided if: E(wasted) = p*l e.g. (l/2) –two sided (both length l) E(wasted) = l Net benefit for short guarded blocks –on wide issue machines

Caltech CS184 Spring DeHon 55 Speculation Branch prediction allows us to continue executing still have to deal with branch being wrong In simple pipelined ISA –outstanding branch resolved before writeback

Caltech CS184 Spring DeHon 56 Speculation Wide-issue ISA? –Likely to have more instructions in flight than mean latency between branches (nd>l) –to exploit parallelism, need to continue computing assuming the chosen path is correct means making result values visible to subsequent instructions which may be wrong if control flow goes another way

Caltech CS184 Spring DeHon 57 Old Problem Mostly the same problem as precise exceptions –want to continue computing forward with tenative values –want to preserve old state so can roll-back to known state

Caltech CS184 Spring DeHon 58 Revisit Re-Order IFID Reorder Bypass EX ALU MPY LD/ST RF Complex (big) bypass logic.

Caltech CS184 Spring DeHon 59 Speculation and Re-Order Buffer Compute and bypass values from reorder buffer At end of re-order buffer –commit to RF (architetural state) in proper program order after branches resolved –if branch wrong, nullify it’s effect (results predicated upon) –flush re-order buffer (pipeline) direct control flow back to correct branch direction

Caltech CS184 Spring DeHon 60 Details As before, – exception delivery must be deferred until can commit instruction –memory operations require reorder/bypass/commit as well History/Future File work –…but transfer time may be more critical in this case

Caltech CS184 Spring DeHon 61 Register Update Unit (RUU) Simplescalar uses this FIFO unit for instruction management serves for both issue and in-order commit Decode: fills empty slots Issue: picks next set of runnable instructions Execution results return here Commit: completed instructions in order from head of FIFO

Caltech CS184 Spring DeHon 62 RUU IF Decode Queue EX ALU MPY LD/ST RF RUU

Caltech CS184 Spring DeHon 63 RUU Needs to hold all outstanding instructions –from: considering for issue –to: completion and final RF writeback Needs to be relatively large Complex?

Caltech CS184 Spring DeHon 64 Admin No lecture W, F –Many of us away at FCCM Next class Monday, April 14 th –Handout today supplemental reading No class Wednesday, April 16 th –Student-Faculty Conference

Caltech CS184 Spring DeHon 65 Big Ideas Parallelism does exist in the problem –obscured by ISA linearization Dataflow Interpretation –preserve dependencies, not control flow sequence –rediscover non-linear “graph”

Caltech CS184 Spring DeHon 66 Big Ideas Interruptions in Control Flow limit our ability to exploit parallelism There is structure in programs –predictability in control flow Make the common case fast Predict/guess common case control flow –to generate larger blocks Nullify effects of erroneous instructions when guess wrong