Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.

Slides:



Advertisements
Similar presentations
Efficient Program Compilation through Machine Learning Techniques Gennady Pekhimenko IBM Canada Angela Demke Brown University of Toronto.
Advertisements

Kristof Beyls, Erik D’Hollander, Frederik Vandeputte ICCS 2005 – May 23 RDVIS: A Tool That Visualizes the Causes of Low Locality and Hints Program Optimizations.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
International Symposium on Low Power Electronics and Design Dynamic Workload Characterization for Power Efficient Scheduling on CMP Systems 1 Gaurav Dhiman,
Dynamic Bayesian Networks (DBNs)
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.
The PinPoints Toolkit for Finding Representative Regions of Large Programs Harish Patil Platform Technology & Architecture Development Enterprise Platform.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
CISC Machine Learning for Solving Systems Problems Presented by: John Tully Dept of Computer & Information Sciences University of Delaware Using.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Trace-Based Automatic Parallelization in the Jikes RVM Borys Bradel University of Toronto.
Colorado Computer Architecture Research Group Architectural Support for Enhanced SMT Job Scheduling Alex Settle Joshua Kihm Andy Janiszewski Daniel A.
Phase Detection Jonathan Winter Casey Smith CS /05/05.
Decomposing Memory Performance Data Structures and Phases Kartik K. Agaram, Stephen W. Keckler, Calvin Lin, Kathryn McKinley Department of Computer Sciences.
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
Workload Characteristics and Representative Workloads David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA.
Automatically Characterizing Large Scale Program Behavior Timothy Sherwood Erez Perelman Greg Hamerly Brad Calder.
A Practical Method For Quickly Evaluating Program Optimizations Grigori Fursin, Albert Cohen, Michael O’Boyle and Olivier Temam ALCHEMY Group, INRIA Futurs.
1 Using A Multiscale Approach to Characterize Workload Dynamics Characterize Workload Dynamics Tao Li June 4, 2005 Dept. of Electrical.
1 Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems G. Pokam and F. Bodin.
Variational Path Profiling Erez Perelman*, Trishul Chilimbi †, Brad Calder* * University of Califonia, San Diego †Microsoft Research, Redmond.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Code Coverage Testing Using Hardware Performance Monitoring Support Alex Shye, Matthew Iyer, Vijay Janapa Reddi and Daniel A. Connors University of Colorado.
SyNAR: Systems Networking and Architecture Group Symbiotic Jobscheduling for a Simultaneous Multithreading Processor Presenter: Alexandra Fedorova Simon.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.
Waleed Alkohlani 1, Jeanine Cook 2, Nafiul Siddique 1 1 New Mexico Sate University 2 Sandia National Laboratories Insight into Application Performance.
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved CHAPTER 9 Simulation Methods SIMULATION METHODS SIMPOINTS PARALLEL SIMULATIONS NONDETERMINISM.
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
Automatically Characterizing Large Scale Program Behavior Timothy Sherwood Erez Perelman Greg Hamerly Brad Calder Used with permission of author.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Methodologies for Performance Simulation of Super-scalar OOO processors Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Single-ISA Heterogeneous Multi-Core Architecture Zvika Guz November, 2004.
A Trust Based Distributed Kalman Filtering Approach for Mode Estimation in Power Systems Tao Jiang, Ion Matei and John S. Baras Institute for Systems Research.
Indoor Location Detection By Arezou Pourmir ECE 539 project Instructor: Professor Yu Hen Hu.
QUINN GAUMER ECE 259/CPS 221 Improving Performance Isolation on Chip Multiprocessors via on Operating System Scheduler.
CISC Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic.
Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.
BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.
Situation We now accept that grammar is not restricted to writing but is present in speech. Problem This can lead to assumptions that there is one kind.
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
Migration Cost Aware Task Scheduling Milestone Shraddha Joshi, Brian Osbun 10/24/2013.
Workload Design: Selecting Representative Program-Input Pairs Lieven Eeckhout Hans Vandierendonck Koen De Bosschere Ghent University, Belgium PACT 2002,
Outline Variables – definition  Physical dimensions  Abstract dimensions Systematic vs. random variables Scales of measurement Reliability of measurement.
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
Best detection scheme achieves 100% hit detection with
An Offline Approach for Whole-Program Paths Analysis using Suffix Arrays G. Pokam, F. Bodin.
Fast Path-Based Neural Branch Prediction Daniel A. Jimenez Presented by: Ioana Burcea.
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
CSE 4705 Artificial Intelligence
Outline Motivation Project Goals Methodology Preliminary Results
A Review of Processor Design Flow
Predictive Performance
A Unifying View on Instance Selection
Phase Capture and Prediction with Applications
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Estimating Timing Profiles for Simulation of Embedded Systems
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Adapted from the slides of Prof
Program Phase Directed Dynamic Cache Way Reconfiguration
Phase based adaptive Branch predictor: Seeing the forest for the trees
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma

400 Million Instructions New Compiler Non-Existent ProcessorNew Processor Simulator Benchmark Spec2000

400 Million Instructions Suppose you have a time budget… Less than half second of execution time What would you simulate? –Beginning? –Middle? –End?

400 Million Instructions gzip gcc Programs exhibit diverse modes of behavior

400 Million Instructions Suppose you have a time budget… Less than half second of execution time What would you simulate? –Beginning? –Middle? –End? –Samples of different modes of behavior

Program Phases Observation: programs exhibit various modes of periodic behavior These modes are program phases Challenge: Extract these automatically

Phase Basics Intervals – slices in times Phases – intervals with similar behavior Time (Instruction Count) IPC

Phase Basics Intervals – slices in times Phases – intervals with similar behavior Time (Instruction Count) IPC

Defining “Similar Behavior” Metric for comparing intervals? –Cache misses? –IPC? –Branch misprediction rates? Problem: Performance alone is too architecture dependent

Defining “Similar Behavior” Code path traversal –Directly affects time-varying behavior –Execute same code, same performance –Architecture independent Metrics for code path traversal –Frequency of branches –Frequency of function calls –Frequency of basic block calls

Basic Block Vector B1 B2B3 B B1B2B3B4 Time t

Basic Block Vector B1 B2B3 B B1B2B3B4 Time t

Basic Block Vector B1 B2B3 B B1B2B3B4 Time t

Basic Block Vector B1 B2B3 B B1B2B3B4 Time t 0000 B1B2B3B4 Time t + 1

Basic Block Vector B1 B2B3 B B1B2B3B4 Time t 1101 B1B2B3B4 Time t + 1

Basic Block Vector B1 B2B3 B B1B2B3B4 Time t 2202 B1B2B3B4 Time t + 1 Manhattan Distance = |1 – 2| + |1 – 0| = 2 Euclidian Distance = sqrt((1 – 2) 2 + (1 – 0) 2 ) = sqrt(2)

Basic Block Similarity Matrix gzip

Basic Block Similarity Matrix gcc BBV similarity between intervals reflects performance similarity

Automatic Phase Classification Classify intervals into phases –We do not know which BBVs correspond to particular phases a priori k-means clustering –Iterative clustering algorithm –Dimension Reduction Random Linear Projection –Try different k values Use BIC to choose best

Automatic Phase Classification

Clustering accurately distinguishes phases automatically

SimPoint Simulate large programs on a budget Perform detailed simulation on representative code snippets –Choose centroid interval from each phase (10 million instructions) Extrapolate large program performance –Weighted by frequency of phase

Simulate 400 million instructions total SimPoint Accurate estimate despite instruction budget

Why SimPoint Succeeds Program behavior varies over time SimPoint intelligently chooses which intervals to simulate Regularity within program phases allows accurate extrapolation

Online Classification Detect phases as program is running Applications –Thread scheduling –Power management –Predicting future phases Challenges –One pass of input –Limited storage

Online Classification

High variance in metrics across full trace Low variance shows online classification succeeds in finding phases

Conclusions Phases are a vital abstraction –Performance varies greatly w/in program –Attributable to different modes of behavior Can discover phases automatically –Offline: k-means clustering –Online Code path characterization –Strong correlation with actual performance –SimPoint exploits this with great success

Outline Introduction (motivate) Basics (definitions, BBV, BBMatrix) Offline Phase Classification –SimPoints Online Phase Classification Conclusions

Limitations of Clustering

Bayesian Information Criterion Fit to Gaussians

Self-Modifying Code Self-modifying code Program Phases 85 o

Learning Phases