Workload Design: Selecting Representative Program-Input Pairs Lieven Eeckhout Hans Vandierendonck Koen De Bosschere Ghent University, Belgium PACT 2002,

Slides:



Advertisements
Similar presentations
Performance Evaluation of Cache Replacement Policies for the SPEC CPU2000 Benchmark Suite Hussein Al-Zoubi.
Advertisements

Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
1 A Characterization of Big Data Benchmarks Wen.Xiong Zhibin Yu, Zhendong Bei, Juanjuan Zhao, Fan Zhang, Yubin Zou, Xue Bai, Ye Li, Chengzhong Xu Shenzhen.
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Project 4 U-Pick – A Project of Your Own Design Proposal Due: April 14 th (earlier ok) Project Due: April 25 th.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
A Brief Introduction to Statistical Forecasting Kevin Werner.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Chapter 2 Dimensionality Reduction. Linear Methods
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 2011 Predicting Solar Generation from Weather Forecasts Using Machine Learning Navin.
Statistical Performance Analysis for Scientific Applications Presentation at the XSEDE14 Conference Atlanta, GA Fei Xing Haihang You Charng-Da Lu July.
Waleed Alkohlani 1, Jeanine Cook 2, Nafiul Siddique 1 1 New Mexico Sate University 2 Sandia National Laboratories Insight into Application Performance.
Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Accurate Analytical Modeling of Superscalar Processors J. E. Smith Tejas Karkhanis.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information.
Analysis of Branch Predictors
Dept. of Computer and Information Sciences : University of Delaware John Cavazos Department of Computer and Information Sciences University of Delaware.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Methodologies for Performance Simulation of Super-scalar OOO processors Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project.
Multivariate Data Analysis Chapter 1 - Introduction.
Architectural Impact of Stateful Networking Applications Javier Verdú, Jorge García Mario Nemirovsky, Mateo Valero The 1st Symposium on Architectures for.
Runtime Software Power Estimation and Minimization Tao Li.
Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
© 2002 IBM Corporation IBM Research 1 Policy Transformation Techniques in Policy- based System Management Mandis Beigi, Seraphin Calo and Dinesh Verma.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
2D-Profiling Detecting Input-Dependent Branches with a Single Input Data Set Hyesoon Kim M. Aater Suleman Onur Mutlu Yale N. Patt HPS Research Group The.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.
Principal Components Analysis ( PCA)
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Ghent University Veerle Desmet Lieven Eeckhout Koen De Bosschere Using Decision Trees to Improve Program-Based and Profile-Based Static Branch Prediction.
Page 1 SARC Samsung Austin R&D Center SARC Maximizing Branch Behavior Coverage for a Limited Simulation Budget Maximilien Breughe 06/18/2016 Championship.
Unsupervised Learning
Computer Sciences Department University of Wisconsin-Madison
CS161 – Design and Architecture of Computer Systems
Dynamic Branch Prediction
‘99 ACM/IEEE International Symposium on Computer Architecture
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Understanding Performance Counter Data - 1
Phase Capture and Prediction with Applications
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Descriptive Statistics vs. Factor Analysis
Multidimensional Space,
Yamanishi, M., Itoh, M., Kanehisa, M.
Principal Component Analysis
Gang Luo, Hongfei Guo {gangluo,
Phase based adaptive Branch predictor: Seeing the forest for the trees
Unsupervised Learning
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

Workload Design: Selecting Representative Program-Input Pairs Lieven Eeckhout Hans Vandierendonck Koen De Bosschere Ghent University, Belgium PACT 2002, September 23, 2002

September 23, 2002PACT Introduction Microprocessor design: simulation of workload = set of programs + inputs –constrained in size due to time limitation –taken from suites, e.g., SPEC, TPC, MediaBench Workload design: –which programs? –which inputs? –representative: large variation in behavior –benchmark-input pairs should be “different”

September 23, 2002PACT Main idea Workload design space is p-D space –with p = # relevant program characteristics –p is too large for understandable visualization –correlation between p characteristics Idea: reduce p-D space to q-D space –with q small (typically 2 to 4) –without losing important information –no correlation –achieved by multivariate data analysis techniques: PCA and cluster analysis

September 23, 2002PACT Goal Measuring impact of input data sets on program behavior –“far away” or weak clustering: different behavior –“close” or strong clustering: similar behavior Applications: –selecting representative program-input pairs e.g., one program-input pair per cluster e.g., take program-input pair with smallest dynamic instruction count –getting insight in influence of input data sets –profile-guided optimization

September 23, 2002PACT Overview Introduction Workload characterization Data analysis –Principal components analysis (PCA) –Cluster analysis Evaluation Discussion Conclusion

September 23, 2002PACT Workload characterization (1) Instruction mix –int, logic, shift&byte, load/store, control Branch prediction accuracy –bimodal (8K*2 bits), gshare (8K*2 bits) and hybrid (meta: 8K*2 bits) branch predictor Data and instruction cache miss rates –Five caches with varying size and associativity

September 23, 2002PACT Workload characterization (2) Number of instructions between two taken branches Instruction-Level Parallelism –IPC of an infinite-resource machine with only read-after-write dependencies In total: p = 20 variables

September 23, 2002PACT Overview Introduction Workload characterization Data analysis –Principal components analysis (PCA) –Cluster analysis Evaluation Discussion Conclusion

September 23, 2002PACT PCA Many program characteristics (variables) are correlated PCA computes new variables –p principal components PC i –linear combination of original characteristics –uncorrelated –contain same total variance over all benchmarks –Var[PC 1 ] > Var [PC 2 ] > Var[PC 3 ] > … –most have near-to-zero variance (constant) –reduce dimension of workload space to q = 2 to 4

September 23, 2002PACT PCA: Interpretation Interpretation –Principal Components (PC) along main axes of ellipse –Var(PC 1 ) > Var(PC 2 ) >... –PC 2 is less important to explain variation over program-input pairs Reduce No. of PC’s –throw out PCs with negligible variance Variable 1 Variable 2 PC 1 PC 2

September 23, 2002PACT Cluster analysis Hierarchic clustering Based on distance between program- input pairs Can be represented by a dendrogram

September 23, 2002PACT Overview Introduction Workload characterization Data analysis –Principal components analysis (PCA) –Cluster analysis Evaluation Discussion Conclusion

September 23, 2002PACT Methodology Benchmarks –SPECint95 Inputs from SPEC: train and ref Inputs from the web (ijpeg) Reduced inputs (compress) –TPC-D on postgres v6.3 –Compiled with –O4 on Alpha –79 program-input pairs ATOM –Instrumentation –Measuring characteristics STATISTICA –Statistical analysis

September 23, 2002PACT GCC: principal components 2 PC’s: 96,9% of total variance

September 23, 2002PACT GCC emit-rtl insn-emit protoize varasm explow recog reload1 expr cp-decl insn-recog print-tree dbxout toplev High branch prediction accuracyHigh I-cache miss rates High D-cache miss rates Many control & shift insn Many LD/STs and ILP 7 inputs

September 23, 2002PACT Workload space: 4 PCs -> 93.1% ijpeg, compress and go are isolated Go: low branch prediction accuracy Compress: high data cache miss rate Ijpeg: high LD/STs rate, low ctrl ops rate Go: low branch prediction accuracy Compress: high data cache miss rate Ijpeg: high LD/STs rate, low ctrl ops rate

September 23, 2002PACT Workload space strong clustering

September 23, 2002PACT Small versus large inputs Vortex: –Train: 3.2B insn –Ref: 92.5B insn –Similar behavior: linkage distance ~ 1.4 Not for m88ksim –Linkage distance ~ 4 Reference input for compress can be reduced without significantly impacting behavior: 2B vs. 60B instructions

September 23, 2002PACT Impact of input on behavior For TPC-D queries: –Weak clustering –Large impact –I-cache behavior In general: variation between programs is larger than the variation between input sets for the same program –However: there are exceptions where input has large impact on behavior, e.g., TPC-D and perl

September 23, 2002PACT Overview Introduction Workload characterization Data analysis –Principal components analysis (PCA) –Cluster analysis Evaluation Discussion Conclusion

September 23, 2002PACT Conclusion Workload design –representative –not long running Principal Components Analysis (PCA) and cluster analysis help in detecting input data sets resulting in similar or different behavior of a program Applications: –workload design: representativeness while taking into account simulation time –impact of input data sets on program behavior –profile-guided optimizations