Phase Capture and Prediction with Applications

Slides:

Advertisements

Similar presentations

T OR A AMODT Andreas Moshovos Paul Chow Electrical and Computer Engineering University of Toronto Canada The Predictability of.

Advertisements

Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.

Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *

1 A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University.

Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.

PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.

Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

CISC Machine Learning for Solving Systems Problems Presented by: John Tully Dept of Computer & Information Sciences University of Delaware Using.

CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

Phase Detection Jonathan Winter Casey Smith CS /05/05.

WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.

Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.

1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.

1 Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007.

Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.

Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.

Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.

1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.

Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Dynamically Trading Frequency for Complexity in a GALS Microprocessor Steven Dropsho, Greg Semeraro, David H. Albonesi, Grigorios Magklis, Michael L. Scott.

Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information.

Analysis of Branch Predictors

1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.

CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.

MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.

Methodologies for Performance Simulation of Super-scalar OOO processors Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project.

BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.

Combining Software and Hardware Monitoring for Improved Power and Performance Tuning Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division.

Dynamic Branch Prediction During Context Switches Jonathan Creekmore Nicolas Spiegelberg T NT.

On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.

Prophet/Critic Hybrid Branch Prediction B B B

Fast Path-Based Neural Branch Prediction Daniel A. Jimenez Presented by: Ioana Burcea.

PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,

CSL718 : Pipelined Processors

Amir Roth and Gurindar S. Sohi University of Wisconsin-Madison

Dynamic Branch Prediction

Dynamically Sizing the TAGE Branch Predictor

FA-TAGE Frequency Aware TAgged GEometric History Length Branch Predictor Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture.

CMSC 611: Advanced Computer Architecture

Tosiron Adegbija and Ann Gordon-Ross+

So far we have dealt with control hazards in instruction pipelines by:

EE 382N Guest Lecture Wish Branches

Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle

Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt

So far we have dealt with control hazards in instruction pipelines by:

Lecture 10: Branch Prediction and Instruction Delivery

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

pipelining: static branch prediction Prof. Eric Rotenberg

Adapted from the slides of Prof

Program Phase Directed Dynamic Cache Way Reconfiguration

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

Aliasing and Anti-Aliasing in Branch History Table Prediction

rePLay: A Hardware Framework for Dynamic Optimization

So far we have dealt with control hazards in instruction pipelines by:

Lois Orosa, Rodolfo Azevedo and Onur Mutlu

The O-GEHL branch predictor

Gang Luo, Hongfei Guo {gangluo,

Phase based adaptive Branch predictor: Seeing the forest for the trees

Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project

Presentation transcript:

Phase Capture and Prediction with Applications Martin Hock Brian Pellin Karthik Jayaraman Vivek Shrivastava University of Wisconsin-Madison

Phases Definition: A period of execution that exhibits the same characteristics

Motivation Programs go through different phases of their execution Phases are often repeated at different times in execution During each phase hardware is exercised differently

Sample Phase Behavior : gcc

Outline Phase Tracking Phase Prediction Applications Phase Based Branch Prediction Phase Based Cache Configuration Summary / Conclusions

Phase Tracking Goal: Identify program phases with different behavior Based on “Phase Tracking and Prediction” [Sherwood, Sair, Calder] Use reconfigurable hardware to take advantage of phase information Reconfigurable caches Instruction window size Dynamic branch predictor

Detecting Phases Track groups of 10 million instructions Collect information about instructions and store Build a phase footprint After each 10 m insts. Compare footprint with past footprints If footprint close enough, it is considered a repetition of the phase

Accumulator Branch PC Hash # of inst. since branch +

Accumulator Branch PC 2 Hash # of inst. since branch 20 + Branch occurs, must increment entry 2 by 20.

Accumulator Branch PC 20 3 Hash # of inst. since branch 80 + New branch, increment entry 3 by 10.

Accumulator Branch PC 20 80 Hash # of inst. since branch + After a phase completes we need somewhere to store data about previous phases.

Past Footprint Table Accumulator Branch PC 20 80 Hash # of inst. since branch + *At 100 instructions

Past Footprint Past Footprint Table Accumulator Branch PC 20 80 Hash # of inst. since branch + Accumulator Data is stored in Past Footprint table

Past Footprint Table Past Footprint Accumulator 90 Branch PC 20 5 80 Hash # of inst. since branch 5 + *At 200 instructions Take the Manhattan distance between accumulator and Past Footprints 90+20+75+5 = 190

Past Footprint Table Past Footprint Accumulator 90 Branch PC 20 80 5 Hash # of inst. since branch 5 + *At 200 instructions

Past Footprint Past Footprint Table Accumulator 90 Branch PC 21 20 79 80 5 Hash # of inst. since branch 5 + *At 300 instructions Manhattan distance between this phase and first phase is 2. This phase is close enough to the first phase to be considered the same as phase one.

Past Footprint Past Footprint Table Accumulator 430 Branch PC 21 20 9 10 80 Hash # of inst. since branch 70 + *At 30 million instructions Manhattan distance between this phase and first phase is 2. This phase is close enough to the first phase to be considered the same as phase one.

Outline Phase Tracking Phase Prediction Applications Phase Based Branch Prediction Phase Based Cache Configuration Summary / Conclusions

Phase prediction When we detect a phase, it’s over In order to adjust hardware, we need to know what phase we are in Three strategies Last seen Markov with RLE Perceptron

Last seen Predict next phase = last phase Because last seen is so simple, another predictor would have to beat it significantly to justify the added cost

RLE Markov Adapted from Sherwood Assumes that if we see phase X exactly Y times in a row, followed by phase Z, then if we see phase X exactly Y times again, it will again be followed by Z

Perceptron Individual perceptrons work in binary (±1) Given history h1, h2, …, hn (±1), weights w0, w1, w2, …, wn (integers), compute S = w0 + w1h1 + w2h2 + … + wnhn If S ≥ 0, predict “yes”, else predict “no” To train, if hi = current , increment wi, else decrement (for w0, add current) But there are many phases, not just 2 Combine perceptrons for multivalue prediction

Multivalue perceptron We have perceptrons P1, P2, …, Pn Perceptron Pi tries to predict phase i Train Pi only if in phase i History hi = 1 if it agrees with the current phase, -1 if disagrees Have the perceptrons vote for who is correct – most positive one wins

Phase prediction results GCC: Last phase: 96% accurate RLE Markov: 94% accurate Perceptron: much lower

Phase prediction comments Sherwood had lower accuracy for last phase (70%), perhaps due to oscillation Training cost of multiple perceptron means that it does not always adapt quickly Not worth improving due to the accuracy of last phase

Outline Phase Tracking Phase Prediction Applications Phase Based Branch Prediction Phase Based Cache Configuration Summary / Conclusions

Phase Based Dynamic Branch Predictor Previous research shows the usefulness of adapting branch predictors at run time “Dynamic history-length fitting: a third level of adaptivity for branch prediction” [Juan, Sanjeevan, Navarro]. “Combining Branch Predictors” [McFarling] Single branch predictor may not perform well within and across different executions. “A study of Branch Prediction Strategies” [Smith] Program behavior almost uniform within a phase -> choose best predictor for each phase

Methodology Select a small group of relevant predictors At the beginning of each new phase, sample all the predictors and choose the best Save the best for each phase and use it if a phase reoccurs

Methodology Select a small group of relevant predictors At the beginning of each new phase, sample all the predictors and choose the best Save the best for each phase and use it if a phase reoccurs

Methodology Select a small group of relevant predictors At the beginning of each new phase, sample all the predictors and choose the best Save the best for each phase and use it if a phase reoccurs

Methodology Select a small group of relevant predictors At the beginning of each new phase, sample all the predictors and choose the best Save the best for each phase and use it if a phase reoccurs

Methodology Select a small group of relevant predictors At the beginning of each new phase, sample all the predictors and choose the best Save the best for each phase and use it if a phase reoccurs Phase 1

Methodology Select a small group of relevant predictors At the beginning of each new phase, sample all the predictors and choose the best Save the best for each phase and use it if a phase reoccurs Phase 1 Phase 2

Dynamic Adaptations Possible dynamic adaptations Multiple Branch Predictors 2Level, Bimodal Sample each for one profiling period Select on basis of [miss rate, number of mis-speculated instructions, …] Varying History Lengths History lengths [0,12] Some workloads give better performance with smaller history

Multiple Branch Predictors Set of predictors 2level [1:1024:8] (Baseline predictor) Bimodal [1024] 2level [8: 512 :8] 2level [1: 512 :8] Profile period 10 million instructions

Multiple Branch Predictors Simulator Used Simplescalar v3.0d Set of benchmarks gcc, vpr, mcf, ammp, art Selection Criterion Least Miss Rate If miss rates of two predictors are within 1 %, select the less expensive (simpler) one

Multiple Branch Predictor : Results IPC (gcc)

Multiple Branch Predictors: Results Branch Predictor Misses (gcc)

Multiple Branch Predictor : Results IPC (vpr)

Multiple Branch Predictors: Results Branch Predictor Misses (vpr)

Multiple Branch Predictors: Results Branch Predictor Misses (mcf)

Multiple Branch Predictors IPC Comparison

Multiple Branch Predictors Branch Prediction Misses Comparison

Varying History Length G-share predictor with varying history lengths Set of history lengths sampled [0,3,6,8,12] Selection Criterion Least Miss Rate If miss rates of two predictors are within 1 %, select the less expensive (simpler) one

Varying History Length Set of benchmarks gcc, mcf Simulator Used Simplescalar v3.0d Profile Period 10 million instructions

Varying History Length: Results IPC (gcc)

Varying History Length: Results Branch Predictor Misses (gcc)

Varying History Length: Result Instruction Cache Misses(IL1) (gcc)

Outline Phase Tracking Phase Prediction Applications Phase Based Branch Prediction Phase Based Cache Configuration Summary / Conclusions

Cache optimization Smaller caches use less power Some phases of execution will use less memory or execute a smaller region of code and therefore need less cache We can use a smaller cache for these phases without affecting performance

Methodology Try 4 possibilities of data and instruction cache simultaneously Data cache and instruction cache misses should be independent Select the best combination Data Instr Phase 1 Phase 2

Cache optimization results GCC IPC Fixed 32K cache (16K + 16K): 1.807 Fixed 128K cache (64K + 64K): 1.896 Optimizer: 1.855 Average: 49K total

Cache comparison

Outline Phase Tracking Phase Prediction Applications Phase Based Branch Prediction Phase Based Cache Configuration Summary / Conclusions

Summary Significant reduction in branch mispredictions (29.88% - 44.35%) using phase based branch predictors Simple predictors beat more complex predictor in many phases Marginal gains in IPC using multiple branch predictor (2.24% - 4.70%) Marginal gains in IL1 misses using phase based multiple branch predictors.

Summary (cont...) Phase based dynamic history length fitting does not give good gains

Conclusions [1] Phase based optimizations provides scope for improvements using reconfigurable hardware Using phase specific branch predictor provides good improvements in mis predictions A good strategy for saving power as mis-predictions may result in reduction of mis- speculated instructions,

Conclusion [2] However, varying history length does not result in substantial savings More benchmarks need to be considered to understand the effect of history length adaptations

Questions??