Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma
400 Million Instructions New Compiler Non-Existent ProcessorNew Processor Simulator Benchmark Spec2000
400 Million Instructions Suppose you have a time budget… Less than half second of execution time What would you simulate? –Beginning? –Middle? –End?
400 Million Instructions gzip gcc Programs exhibit diverse modes of behavior
400 Million Instructions Suppose you have a time budget… Less than half second of execution time What would you simulate? –Beginning? –Middle? –End? –Samples of different modes of behavior
Program Phases Observation: programs exhibit various modes of periodic behavior These modes are program phases Challenge: Extract these automatically
Phase Basics Intervals – slices in times Phases – intervals with similar behavior Time (Instruction Count) IPC
Phase Basics Intervals – slices in times Phases – intervals with similar behavior Time (Instruction Count) IPC
Defining “Similar Behavior” Metric for comparing intervals? –Cache misses? –IPC? –Branch misprediction rates? Problem: Performance alone is too architecture dependent
Defining “Similar Behavior” Code path traversal –Directly affects time-varying behavior –Execute same code, same performance –Architecture independent Metrics for code path traversal –Frequency of branches –Frequency of function calls –Frequency of basic block calls
Basic Block Vector B1 B2B3 B B1B2B3B4 Time t
Basic Block Vector B1 B2B3 B B1B2B3B4 Time t
Basic Block Vector B1 B2B3 B B1B2B3B4 Time t
Basic Block Vector B1 B2B3 B B1B2B3B4 Time t 0000 B1B2B3B4 Time t + 1
Basic Block Vector B1 B2B3 B B1B2B3B4 Time t 1101 B1B2B3B4 Time t + 1
Basic Block Vector B1 B2B3 B B1B2B3B4 Time t 2202 B1B2B3B4 Time t + 1 Manhattan Distance = |1 – 2| + |1 – 0| = 2 Euclidian Distance = sqrt((1 – 2) 2 + (1 – 0) 2 ) = sqrt(2)
Basic Block Similarity Matrix gzip
Basic Block Similarity Matrix gcc BBV similarity between intervals reflects performance similarity
Automatic Phase Classification Classify intervals into phases –We do not know which BBVs correspond to particular phases a priori k-means clustering –Iterative clustering algorithm –Dimension Reduction Random Linear Projection –Try different k values Use BIC to choose best
Automatic Phase Classification
Clustering accurately distinguishes phases automatically
SimPoint Simulate large programs on a budget Perform detailed simulation on representative code snippets –Choose centroid interval from each phase (10 million instructions) Extrapolate large program performance –Weighted by frequency of phase
Simulate 400 million instructions total SimPoint Accurate estimate despite instruction budget
Why SimPoint Succeeds Program behavior varies over time SimPoint intelligently chooses which intervals to simulate Regularity within program phases allows accurate extrapolation
Online Classification Detect phases as program is running Applications –Thread scheduling –Power management –Predicting future phases Challenges –One pass of input –Limited storage
Online Classification
High variance in metrics across full trace Low variance shows online classification succeeds in finding phases
Conclusions Phases are a vital abstraction –Performance varies greatly w/in program –Attributable to different modes of behavior Can discover phases automatically –Offline: k-means clustering –Online Code path characterization –Strong correlation with actual performance –SimPoint exploits this with great success
Outline Introduction (motivate) Basics (definitions, BBV, BBMatrix) Offline Phase Classification –SimPoints Online Phase Classification Conclusions
Limitations of Clustering
Bayesian Information Criterion Fit to Gaussians
Self-Modifying Code Self-modifying code Program Phases 85 o
Learning Phases