CS252 Graduate Computer Architecture Lecture 14 Prediction (Con’t) (Dependencies, Load Values, Data Values) John Kubiatowicz Electrical Engineering and.

Slides:



Advertisements
Similar presentations
Branch prediction Titov Alexander MDSP November, 2009.
Advertisements

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Pipelining and Control Hazards Oct
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
Dynamic Branch Prediction
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
CS252 Graduate Computer Architecture Lecture 9 Prediction (Con’t) (Dependencies, Load Values, Data Values) February 22 nd, 2010 John Kubiatowicz Electrical.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE A Survey on BRANCH PREDICTION METHODOLOGY By, Baris Mustafa Kazar Resit Sendag.
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Goal: Reduce the Penalty of Control Hazards
Branch Target Buffers BPB: Tag + Prediction
Multiscalar processors
CS252 Graduate Computer Architecture Lecture 13 Prediction (Branches, Return Addrs, Dependencies) John Kubiatowicz Electrical Engineering and Computer.
1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)
Dynamic Branch Prediction
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
Appendix A Pipelining: Basic and Intermediate Concepts
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Code Generation CS 480. Can be complex To do a good job of teaching about code generation I could easily spend ten weeks But, don’t have ten weeks, so.
CS 7810 Lecture 9 Effective Hardware-Based Data Prefetching for High-Performance Processors T-F. Chen and J-L. Baer IEEE Transactions on Computers, 44(5)
A Characterization of Processor Performance in the VAX-11/780 From the ISCA Proceedings 1984 Emer & Clark.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 2: Pipeline problems & tricks dr.ir. A.C. Verschueren Eindhoven.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
CS252 Graduate Computer Architecture Lecture 10 Prediction/Speculation February 22 nd, 2012 John Kubiatowicz Electrical Engineering and Computer Sciences.
CS252 Graduate Computer Architecture Lecture 9 Prediction/Speculation (Branches, Return Addrs) February 15 th, 2011 John Kubiatowicz Electrical Engineering.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
Yiorgos Makris Professor Department of Electrical Engineering University of Texas at Dallas EE (CE) 6304 Computer Architecture Lecture #15 (11/5/15) Course.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.
Yiorgos Makris Professor Department of Electrical Engineering University of Texas at Dallas EE (CE) 6304 Computer Architecture Lecture #13 (10/28/15) Course.
Value Prediction Kyaw Kyaw, Min Pan Final Project.
Dynamic Branch Prediction
Data Prefetching Smruti R. Sarangi.
CS203 – Advanced Computer Architecture
Dynamic Branch Prediction
CS 704 Advanced Computer Architecture
FA-TAGE Frequency Aware TAgged GEometric History Length Branch Predictor Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture.
CMSC 611: Advanced Computer Architecture
CS252 Graduate Computer Architecture Lecture 9 Prediction (Con’t) (Dependencies, Load Values, Data Values) John Kubiatowicz Electrical Engineering and.
Module 3: Branch Prediction
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
So far we have dealt with control hazards in instruction pipelines by:
Dynamic Branch Prediction
Advanced Computer Architecture
So far we have dealt with control hazards in instruction pipelines by:
Data Prefetching Smruti R. Sarangi.
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Adapted from the slides of Prof
Dynamic Hardware Prediction
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
Presentation transcript:

CS252 Graduate Computer Architecture Lecture 14 Prediction (Con’t) (Dependencies, Load Values, Data Values) John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

3/12/2007cs252-S07, Lecture 14 2 Review: Yeh and Patt classification GBHR GPHT GAg GPHT PABHR PAg PAPHT PABHR PAp GAg: Global History Register, Global History Table PAg: Per-Address History Register, Global History Table PAp: Per-Address History Register, Per-Address History Table

3/12/2007cs252-S07, Lecture 14 3 Review: Other Global Variants GAs: Global History Register, Per-Address (Set Associative) History Table Gshare: Global History Register, Global History Table with Simple attempt at anti-aliasing GAs GBHR PAPHT GShare GPHT GBHR Address 

3/12/2007cs252-S07, Lecture 14 4 Review: Tournament Predictors Motivation for correlating branch predictors is 2- bit predictor failed on important branches; by adding global information, performance improved Tournament predictors: use 2 predictors, 1 based on global information and 1 based on local information, and combine with a selector Use the predictor that tends to guess correctly addr history Predictor A Predictor B

3/12/2007cs252-S07, Lecture 14 5 Review: Memory Dependence Prediction Important to speculate? Two Extremes: –Naïve Speculation: always let load go forward –No Speculation: always wait for dependencies to be resolved Compare Naïve Speculation to No Speculation –False Dependency: wait when don’t have to –Order Violation: result of speculating incorrectly Goal of prediction: –Avoid false dependencies and order violations From “Memory Dependence Prediction using Store Sets”, Chrysos and Emer.

3/12/2007cs252-S07, Lecture 14 6 Premise: Past indicates Future Basic Premise is that past dependencies indicate future dependencies –Not always true! Hopefully true most of time Store Set: Set of store insts that affect given load –Example: AddrInst 0Store C 4Store A 8Store B 12Store C 28Load B  Store set { PC 8 } 32Load D  Store set { (null) } 36Load C  Store set { PC 0, PC 12 } 40Load B  Store set { PC 8 } –Idea: Store set for load starts empty. If ever load go forward and this causes a violation, add offending store to load’s store set Approach: For each indeterminate load: –If Store from Store set is in pipeline, stall Else let go forward Does this work?

3/12/2007cs252-S07, Lecture 14 7 How well does “infinite” tracking work? “Infinite” here means to place no limits on: –Number of store sets –Number of stores in given set Seems to do pretty well –Note: “Not Predicted” means load had empty store set –Only Applu and Xlisp seems to have false dependencies

3/12/2007cs252-S07, Lecture 14 8 How to track Store Sets in reality? SSIT: Assigns Loads and Stores to Store Set ID (SSID) –Notice that this requires each store to be in only one store set! LFST: Maps SSIDs to most recent fetched store –When Load is fetched, allows it to find most recent store in its store set that is executing (if any)  allows stalling until store finished –When Store is fetched, allows it to wait for previous store in store set »Pretty much same type of ordering as enforced by ROB anyway »Transitivity  loads end up waiting for all active stores in store set What if store needs to be in two store sets? –Allow store sets to be merged together deterministically »Two loads, multiple stores get same SSID Want periodic clearing of SSIT to avoid: –problems with aliasing across program –Out of control merging

3/12/2007cs252-S07, Lecture 14 9 How well does this do? Comparison against Store Barrier Cache –Marks individual Stores as “tending to cause memory violations” –Not specific to particular loads…. Problem with APPLU? –Analyzed in paper: has complex 3-level inner loop in which loads occasionally depend on stores –Forces overly conservative stalls (i.e. false dependencies)

3/12/2007cs252-S07, Lecture Load Value Predictability Try to predict the result of a load before going to memory Paper: “Value locality and load value prediction” –Mikko H. Lipasti, Christopher B. Wilkerson and John Paul Shen Notion of value locality –Fraction of instances of a given load that match last n different values Is there any value locality in typical programs? –Yes! –With history depth of 1: most integer programs show over 50% repetition –With history depth of 16: most integer programs show over 80% repetition –Not everything does well: see cjpeg, swm256, and tomcatv Locality varies by type: –Quite high for inst/data addresses –Reasonable for integer values –Not as high for FP values

3/12/2007cs252-S07, Lecture Load Value Prediction Table Load Value Prediction Table (LVPT) –Untagged, Direct Mapped –Takes Instructions  Predicted Data Contains history of last n unique values from given instruction –Can contain aliases, since untagged How to predict? –When n=1, easy –When n=16? Use Oracle Is every load predictable? –No! Why not? –Must identify predictable loads somehow LVPT Instruction Addr Prediction Results

3/12/2007cs252-S07, Lecture Load Classification Table (LCT) –Untagged, Direct Mapped –Takes Instructions  Single bit of whether or not to predict How to implement? –Uses saturating counters (2 or 1 bit) –When prediction correct, increment –When prediction incorrect, decrement With 2 bit counter –0,1  not predictable –2  predictable –3  constant (very predictable) With 1 bit counter –0  not predictable –1  constant (very predictable) Instruction Addr LCT Predictable? Correction

3/12/2007cs252-S07, Lecture Accuracy of LCT Question of accuracy is about how well we avoid: –Predicting unpredictable load –Not predicting predictable loads How well does this work? –Difference between “Simple” and “Limit”: history depth »Simple: depth 1 »Limit: depth 16 –Limit tends to classify more things as predictable (since this works more often) Basic Principle: –Often works better to have one structure decide on the basic “predictability” of structure –Independent of prediction structure

3/12/2007cs252-S07, Lecture Constant Value Unit Idea: Identify a load instruction as “constant” –Can ignore cache lookup (no verification) –Must enforce by monitoring result of stores to remove “constant” status How well does this work? –Seems to identify 6-18% of loads as constant –Must be unchanging enough to cause LCT to classify as constant

3/12/2007cs252-S07, Lecture Load Value Architecture LCT/LVPT in fetch stage CVU in execute stage –Used to bypass cache entirely –(Know that result is good) Results: Some speedups –21264 seems to do better than Power PC –Authors think this is because of small first-level cache and in-order execution makes CVU more useful

3/12/2007cs252-S07, Lecture Data Value Prediction Why do it? –Can “Break the DataFlow Boundary” –Before: Critical path = 4 operations (probably worse) –After: Critical path = 1 operation (plus verification) + * / A B + YX + * / A B + YX Gues s

3/12/2007cs252-S07, Lecture Data Value Predictability “The Predictability of Data Values” –Yiannakis Sazeides and James Smith, Micro 30, 1997 Three different types of Patterns: –Constant (C): … –Stride (S): … –Non-Stride (NS): … Combinations: –Repeated Stride (RS): –Repeadted Non-Stride (RNS):

3/12/2007cs252-S07, Lecture Computational Predictors Last Value Predictors –Predict that instruction will produce same value as last time –Requires some form of hysteresis. Two subtle alternatives: »Saturating counter incremented/decremented on success/failure replace when the count is below threshold »Keep old value until new value seen frequently enough –Second version predicts a constant when appears temporarily constant Stride Predictors –Predict next value by adding the sum of most recent value to difference of two most recent values: »If v n-1 and v n-2 are the two most recent values, then predict next value will be: v n-1 + (v n-1 – v n-2 ) »The value (v n-1 – v n-2 ) is called the “stride” –Important variations in hysteresis: »Change stride only if saturating counter falls below threshold »Or “two-delta” method. Two strides maintained. First (S1) always updated by difference between two most recent values Other (S2) used for computing predictions When S1 seen twice in a row, then S1  S2 More complex predictors: –Multiple strides for nested loops –Complex computations for complex loops (polynomials, etc!)

3/12/2007cs252-S07, Lecture Context Based Predictors Context Based Predictor –Relies on Tables to do trick –Classified according to the order: an “n-th” order model takes last n values and uses this to produce prediction »So – 0 th order predictor will be entirely frequency based Consider sequence: a a a b c a a a b c a a a –Next value is? “Blending”: Use prediction of highest order available

3/12/2007cs252-S07, Lecture Which is better? Stride-based: –Learns faster –less state –Much cheaper in terms of hardware! –runs into errors for any pattern that is not an infinite stride Context-based: –Much longer to train –Performs perfectly once trained –Much more expensive hardware

3/12/2007cs252-S07, Lecture How predictable are data items? Assumptions – looking for limits –Prediction done with no table aliasing (every instruction has own set of tables/strides/etc. –Only instructions that write into registers are measured »Excludes stores, branches, jumps, etc Overall Predictability: –L = Last Value –S = Stride (delta-2) –FCMx = Order x context based predictor

3/12/2007cs252-S07, Lecture Correlation of Predicted Sets Way to interpret: –l = last val –s = stride –f = fcm3 Combinations: –ls = both l and s –Etc. Conclusion? –Only 18% not predicted correctly by any model –About 40% captured by all predictors –A significant fraction (over 20%) only captured by fcm –Stride does well! »Over 60% of correct predictions captured –Last-Value seems to have very little added value

3/12/2007cs252-S07, Lecture Number of unique values Data Observations: –Many static instructions (>50%) generate only one value –Majority of static instructions (>90%) generate fewer than 64 values –Majority of dynamic instructions (>50%) correspond to static insts that generate fewer than 64 values –Over 90% of dynamic instructions correspond to static insts that generate fewer than 4096 unique values Suggests that a relatively small number of values would be required for actual context prediction

3/12/2007cs252-S07, Lecture Conclusion Dependence Prediction: Try to predict whether load depends on stores before addresses are known –Store set: Set of stores that have had dependencies with load in the past Last Value Prediction –Predict that value of load will be similar (same?) as previous value –Works better than one might expect Computational Based Predictors –Try to construct prediction based on some actual computation –Last Value is trivial Prediction –Stride Based Prediction is slightly more complex »Uses linear model of values Context Based Predictors –Table Driven –When see given sequence, repeat what was seen last time –Can reproduce complex patterns