Page 1 SARC Samsung Austin R&D Center SARC Maximizing Branch Behavior Coverage for a Limited Simulation Budget Maximilien Breughe 06/18/2016 Championship.

Slides:

Advertisements

Similar presentations

1 Aashish Phansalkar & Lizy K. John Performance Prediction Using Program Similarity The University of Texas at Austin.

Advertisements

Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.

Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.

Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.

TAGE-SC-L Branch Predictors

3.13. Fallacies and Pitfalls Fallacy: Processors with lower CPIs will always be faster Fallacy: Processors with faster clock rates will always be faster.

Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.

Access Region Locality for High- Bandwidth Processor Memory System Design Sangyeun Cho Samsung/U of Minnesota Pen-Chung Yew U of Minnesota Gyungho Lee.

CISC Machine Learning for Solving Systems Problems Presented by: John Tully Dept of Computer & Information Sciences University of Delaware Using.

CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.

Self-Correlating Predictive Information Tracking for Large-Scale Production Systems Zhao, Tan, Gong, Gu, Wambolt Presented by: Andrew Hahn.

1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.

Automatically Characterizing Large Scale Program Behavior Timothy Sherwood Erez Perelman Greg Hamerly Brad Calder.

Branch Target Buffers BPB: Tag + Prediction

Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.

Faculty of Computer Science © 2006 CMPUT 229 Accelerating Performance The RISC Revolution.

Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.

Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.

Rapid Identification of Architectural Bottlenecks via Precise Event Counting John Demme, Simha Sethumadhavan Columbia University

Characterizing Multi-threaded Applications for Designing Sharing-aware Last-level Cache Replacement Policies Ragavendra Natarajan 1, Mainak Chaudhuri 2.

1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.

CSE 185 Introduction to Computer Vision Pattern Recognition.

Waleed Alkohlani 1, Jeanine Cook 2, Nafiul Siddique 1 1 New Mexico Sate University 2 Sandia National Laboratories Insight into Application Performance.

Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information.

Analysis of Branch Predictors

Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir.

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.

Dept. of Computer and Information Sciences : University of Delaware John Cavazos Department of Computer and Information Sciences University of Delaware.

NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA,SURATHKAL Presentation on ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS Publisher’s:

3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.

1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA.

Automatically Characterizing Large Scale Program Behavior Timothy Sherwood Erez Perelman Greg Hamerly Brad Calder Used with permission of author.

MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,

Computer Graphics and Image Processing (CIS-601).

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.

Methodologies for Performance Simulation of Super-scalar OOO processors Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project.

Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,

Workload Design: Selecting Representative Program-Input Pairs Lieven Eeckhout Hans Vandierendonck Koen De Bosschere Ghent University, Belgium PACT 2002,

Sunpyo Hong, Hyesoon Kim

Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.

FAT predictor Sabareesh Ganapathy, Prasanna Venkatesh Srinivasan, Maribel Monica.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

CS203 – Advanced Computer Architecture

Dynamic Branch Prediction

Instance Based Learning

Basic machine learning background with Python scikit-learn

FA-TAGE Frequency Aware TAgged GEometric History Length Branch Predictor Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture.

Adaptive Cache Replacement Policy

CMSC 611: Advanced Computer Architecture

Exploring Value Prediction with the EVES predictor

Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”

Perceptrons for Dummies

Module 3: Branch Prediction

Phase Capture and Prediction with Applications

Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle

Lecture 10: Branch Prediction and Instruction Delivery

Software metrics.

5th JILP Workshop on Computer Architecture Competitions

Predicting Unroll Factors Using Supervised Classification

Adapted from the slides of Prof

Gang Luo, Hongfei Guo {gangluo,

Phase based adaptive Branch predictor: Seeing the forest for the trees

Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project

Presentation transcript:

Page 1 SARC Samsung Austin R&D Center SARC Maximizing Branch Behavior Coverage for a Limited Simulation Budget Maximilien Breughe 06/18/2016 Championship Branch Prediction (CBP-5) in conjunction with ISCA 2016

Page 2 SARC Samsung Austin R&D Center SARC Reduce thousands of workloads into small training set Training Set Suite C Suite A Suite B Thousands of workloads from popular benchmark suites Representative Set of 200 small (100M) + 23 big (1B) workloads

Page 3 SARC Samsung Austin R&D Center SARC Characterization through statistics Characterize Branch Behavior through 16 statistics PCA Principal Component Analysis to reduce correlation between the 16 statistics K-means clustering Classify workloads based on distance to neighbors Select representative workloads per cluster How to select workloads?

Page 4 SARC Samsung Austin R&D Center SARC Various metrics exist –E.g., MPKI, branch missrate, branch predictability, etc. –Branch Entropy [De Pestel et al., 2015] How to characterize branch behavior? Branch Address History Pattern # Not Taken # TakenEntropy 0xf2c xf2c xfde xfde Microarchitecture independent Predicts MPKI Always Taken for this pattern Never Taken for this pattern Half of the times taken 4x as much taken as not taken “For a given workload and n history bits, what is the overall complexity to predict a branch?”

Page 5 SARC Samsung Austin R&D Center SARC Branch Behavior Vector = (MPKI 1, MPKI 2, MPKI 3, IPC 1, IPC 2, IPC 3, MR, E L32, E L64, E G32, E G64, E T32, E T64, SBC, DBC, ILP) MPKI for three sizes of branch predictors IPC for three sizes of branch predictors Missrate for one branch predictor Local Branch Entropy (different sizes) Global Branch Entropy (different sizes) Tournament Branch Entropy (different sizes) Static and Dynamic Branch Count Instruction Level Parallelism Inversely proportional to misprediction penalty [Eyerman et al., 2006]

Page 6 SARC Samsung Austin R&D Center SARC = (MPKI 1, MPKI 2, MPKI 3, IPC 1, IPC 2, IPC 3, MR, E L32, E L64, E G32, E G64, E T32, E T64, SBC, DBC, ILP) Branch Behavior Space = (MPKI 1, MPKI 2, MPKI 3, IPC 1, IPC 2, IPC 3, MR, E L32, E L64, E G32, E G64, E T32, E T64, SBC, DBC, ILP) = (MPKI 1 (2), MPKI 2 (2), MPKI 3 (2), IPC 1 (2), IPC 2 (2), IPC 3 (2), MR(2), E L32 (2), E L64 (2), E G32 (2), E G64 (2), E T32 (2), E T64 (2), SBC(2), DBC(2), ILP(2)) = (MPKI 1 (i), MPKI 2 (i), MPKI 3 (i), IPC 1 (i), IPC 2 (i), IPC 3 (i), MR(i), E L32 (i), E L64 (i), E G32 (i), E G64 (i), E T32 (i), E T64 (i), SBC(i), DBC(i), ILP(i)) = (MPKI 1 (N), MPKI 2 (N), MPKI 3 (N), IPC 1 (N), IPC 2 (N), IPC 3 (N), MR(N), E L32 (N), E L64 (N), E G32 (N), E G64 (N), E T32 (N), E T64 (N), SBC(N), DBC(N), ILP(N)) … … One vector per workload  N data points 16 dimensions, which are likely correlated

Page 7 SARC Samsung Austin R&D Center SARC Characterization through statistics Characterize Branch Behavior through 16 statistics PCA Principal Component Analysis to reduce correlation between the 16 statistics K-means clustering Classify workloads based on distance to neighbors Select representative workloads per cluster How to select workloads?

Page 8 SARC Samsung Austin R&D Center SARC Principal Component Analysis Example PC 1 PC 2 PC 1 captures 99% of the variance of the data We can remove PC 2 and reduce our space to 1 dimension without significant loss of information

Page 9 SARC Samsung Austin R&D Center SARC PCA reduces 16D space to 5D space Average MPKI – Average 63-bit entropy Projection of non-entropy but microarchitectural independent stats 92% of information captured by 5 first PC’s Graphical view of how the first two PC’s are composed The first two PC’s capture 65% of the variance Avg entropy + Avg MPKI – Avg IPC 2x (Avg IPC) + Avg local/tournament entropy

Page 10 SARC Samsung Austin R&D Center SARC Characterization through statistics Characterize Branch Behavior through 16 statistics PCA Principal Component Analysis to reduce correlation between the 16 statistics K-means clustering Classify workloads based on distance to neighbors Select representative workloads per cluster How to select workloads?

Page 11 SARC Samsung Austin R&D Center SARC K-means for K=12 and 2 dimensions Traditional K-means: Select data point closest to cluster center We set K=200 and use the 5 PC’s as dimensions Accuracy Improvement: Select Longest workload Closest to cluster center

Page 12 SARC Samsung Austin R&D Center SARC MPKI and IPC Prediction Results Traditional K- means Adjustment for workload size MethodMPKI ErrorIPC Error Traditional K-means< 2%< 1% Size Preferred< 1%< 0.1% Size Preferred increases accuracy

Page 13 SARC Samsung Austin R&D Center SARC Statistics Collection Speed (cf. Native Execution) 1 Functional simulation: 3 orders of magnitude slower (e.g., gem5: 3 MIPS) Local Branch Entropy (different sizes) Global Branch Entropy (different sizes) Tournament Branch Entropy (different sizes) Static and Dynamic Branch Count Instruction Level Parallelism MPKI for three sizes of branch predictors IPC for three sizes of branch predictors Missrate for one branch predictor 3 detailed simulations: 5 orders of magnitude slower (e.g., gem5: 40 KIPS) What if we removed Microarchitectural dependent statistics?

Page 14 SARC Samsung Austin R&D Center SARC PCA on Microarchitecture Independent Statistics Average entropy Projection of non- entropy stats f1(non-entropy stats) + global entropy – local entropy f2(non-entropy stats) – global entropy + local entropy 89.1% of information captured by 5 first PC’s

Page 15 SARC Samsung Austin R&D Center SARC MPKI and IPC Prediction Results Traditional K- means Adjustment for workload size Using only microarchitectural independent metrics MethodMPKI ErrorIPC ErrorSimulation Time cf. Native execution Traditional K- means < 2%< 1% x 3N x 10 5 Size Preferred< 1%< 0.1% x 3N x 10 5 Microarchitectural independent < 8.2%< 2.5% x N x 10 3 Accuracy vs Simulation overhead trade-off

Page 16 SARC Samsung Austin R&D Center SARC PCA and K-means clustering to reduce the amount of workloads Training set of 200 small workloads Evaluation set of 400 small workloads MPKI and IPC prediction with less than 1% and 0.1% error Statistics collection: 3 detailed simulations for all original workloads Microarchitectural independent statistics to reduce collection overhead Training set of 23 big workloads Evaluation set of 40 big workloads MPKI and IPC prediction with less than 8% and 2.5% error Statistics collection: 1 functional simulation for all original workloads Conclusion

Page 17 SARC Samsung Austin R&D Center SARC [De Pestel et al., 2015] Micro-Architecture Independent Branch Behavior Characterization, Sander De Pestel, Stijn Eyerman, and Lieven Eeckhout, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp , March 2015 [Eyerman et al., 2006] Characterizing the Branch Misprediction Penalty, Stijn Eyerman, James E. Smith, and Lieven Eeckhout, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp , March 2006 [Joshi et al., 2006] Measuring Program Similarity Using Inherent Program Characteristics, Ajay M. Joshi, Aashish Phansalkar, Lieven Eeckhout, and Lizy K. John, IEEE Transactions on Computers, Vol 55, No 6, pp References

Page 18 SARC Samsung Austin R&D Center SARC Backup slides

Page 19 SARC Samsung Austin R&D Center SARC Various metrics exist –E.g., MPKI, branch missrate, branch predictability, etc. –Branch Entropy [De Pestel et al., 2015] How to characterize branch behavior? Branch Address History Pattern # Not Taken# Taken 0x ………… ijN 0 (i,j)N 1 (i,j) ………… 0xFFFFFF Prob[dir = taken| addr = i, pattern = j] n pattern bits Compute the weighted average entropy E Calculate entropy for all i and j E L (i, j) Branch Entropy with n history bits for this workload Microarchitecture independent Predicts MPKI

Page 20 SARC Samsung Austin R&D Center SARC Entropy calculation

Page 21 SARC Samsung Austin R&D Center SARC Reducing to 5 dimensions yields 92% of information = 92.3