CISC Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware Machine Learning applied to Static Compilation Lecture 2
CISC Machine Learning for Solving Systems Problems Hardware constantly changing Heterogeneous Processors in Gaming Devices Massively Parallel Graphics Processing Units Heterogeneous Processors In Supercomputers Powerful Embedded Devices
CISC Machine Learning for Solving Systems Problems Compilers changing slower ► In the early days of compilers … 1957: The FORTRAN Automatic Coding System Front End Front End Middle End Back End Index Optimiz’n Code Merge Flow Analysis Register Allocation Final Assembly
CISC Machine Learning for Solving Systems Problems ► And 50 years later… ► Compilers have not changed much ► Inadequate support for modern architectures Compiler changing slower Front End Front End Middle End Back End High-Level Optimiz’n Mid-Level Optimiz’n Flow Analysis Register Allocation Final Assembly 2007: Typical Compiler
CISC Machine Learning for Solving Systems Problems Proposed Solution ► Intelligent Compilers ► Using AI (i.e., machine learning) techniques ► Learn to optimize ► Specialize to architecture Feedback Intelligent Compiler (Ex: Neural Networks, Decision Trees, Reinforcement Learning) Applications Architecture
CISC Machine Learning for Solving Systems Problems Intelligent Compilers? ► Compiler improves itself ► Showing it examples of behaviour we want. Unroll Tiling Fusion Fission
CISC Machine Learning for Solving Systems Problems Applying Machine Learning ► Inputs ► Program characterization ► Outputs ► Set of optimizations to apply
CISC Machine Learning for Solving Systems Problems Case Study ► Whole Program Optimization ► Paper: Rapidly Selecting Good Compiler Optimizations using Performance Counters, Cavazos et al., CGO 2007
CISC Machine Learning for Solving Systems Problems Whole Program Optimization ► Automatically construct “model” ► Map performance counters to good opts ► Model predicts optimizations to apply ► Use performance counter characterization
CISC Machine Learning for Solving Systems Problems Inputs : Performance Cntrs ► Mnemonic Description Avg Values ► FPU_IDL (Floating Unit Idle) ► VEC_INS (Vector Instructions) ► BR_INS (Branch Instructions) ► L1_ICH (L1 Icache Hits) Application
CISC Machine Learning for Solving Systems Problems Outputs : Optimizations Optimization Level Opt Level O0 Opt Level O1 Opt Level O2 Optimizations Controlled Branch Opts Low Constant Prop / Local CSE Reorder Code Copy Prop / Tail Recursion Static Splitting / Branch Opt Med Simple Opts Low While into Untils / Loop Unroll Branch Opt High / Redundant BR Simple Opts Med / Load Elim Expression Fold / Coalesce Global Copy Prop / Global CSE SSA
CISC Machine Learning for Solving Systems Problems Training Compiler ► Present a training database of ► Characteristics of application ► “Right” optimizations to use Unroll Tiling Fusion Fission Unroll Tiling Fusion Fission (.91,.32,.40,51) (.61,.12,.50,81) Model
CISC Machine Learning for Solving Systems Problems Using Trained Compiler ► Present characteristics of “new” application ► Compiler predicts how to optimize it (.81,.35,.40,69) Model
CISC Machine Learning for Solving Systems Problems Performance Counters
CISC Machine Learning for Solving Systems Problems Characterization of 181.mcf ► Perf cntrs relative to several benchmarks
CISC Machine Learning for Solving Systems Problems Characterization of 181.mcf ► Perf cntrs relative to several benchmarks Problem: Greater number of memory accesses per instruction than average
CISC Machine Learning for Solving Systems Problems Training PC Model Compiler and
CISC Machine Learning for Solving Systems Problems Programs to train model (different from test program). Compiler and Training Perf Cntr Model
CISC Machine Learning for Solving Systems Problems Baseline runs to capture performance counter values. Compiler and Training Perf Cntr Model
CISC Machine Learning for Solving Systems Problems Obtain performance counter values for a benchmark. Compiler and Training Perf Cntr Model
CISC Machine Learning for Solving Systems Problems Best optimizations runs to get speedup values. Compiler and Training Perf Cntr Model
CISC Machine Learning for Solving Systems Problems Best optimizations runs to get speedup values. Compiler and Training Perf Cntr Model
CISC Machine Learning for Solving Systems Problems Perform training on a large set of programs. Compiler and Training Perf Cntr Model
CISC Machine Learning for Solving Systems Problems New program interested in obtaining good performance. Compiler and Using Perf Cntr Model
CISC Machine Learning for Solving Systems Problems Baseline run to capture performance counter values. Compiler and Using Perf Cntr Model
CISC Machine Learning for Solving Systems Problems Input performance counter values to model. Compiler and Using Perf Cntr Model
CISC Machine Learning for Solving Systems Problems Model predicts optimization sequences to apply Compiler and Using Perf Cntr Model
CISC Machine Learning for Solving Systems Problems Model can predict multiple optimization sequences to try. Compiler and Using Perf Cntr Model
CISC Machine Learning for Solving Systems Problems ► Variation of ordinary regression ► Inputs ► Continuous, discrete, or a mix ► 60 performance counters ► All normalized to cycles executed ► Ouputs ► Number between 0 and 1 ► Probability an optimization is beneficial Logistic Regression
CISC Machine Learning for Solving Systems Problems ► Pathscale industrial-strength compiler ► Compare to highest opt level (-Ofast) ► Orchestrate 121 compiler optimizations ► AMD Athlon processor ► Real machine; Not simulation ► 57 benchmarks ► SPEC (95, 2000), MiBench, Polyhedral Experimental Methodology
CISC Machine Learning for Solving Systems Problems ► RAND ► Randomly select 500 optimization seqs ► Combined Elimination (CE) ► State-of-the-art search technique [CGO ‘06] ► Performance Counter (PC) Model Evaluated Search Strategies
CISC Machine Learning for Solving Systems Problems PCModel vs CE 9 benchmarks over 20% improvement and 17% on average!
CISC Machine Learning for Solving Systems Problems PCModel vs CE Obtained over 25% improvement on 6 benchmarks!
CISC Machine Learning for Solving Systems Problems PCModel vs CE On average, CE obtains 9% and PC Model 17% over -Ofast
CISC Machine Learning for Solving Systems Problems Performance vs Evaluations
CISC Machine Learning for Solving Systems Problems Performance vs Evaluations PC Model (17%)
CISC Machine Learning for Solving Systems Problems Performance vs Evaluations Random (17%)
CISC Machine Learning for Solving Systems Problems Performance vs Evaluations Combined Elim (12%)
CISC Machine Learning for Solving Systems Problems CE worse than RAND? ► Combined Elimination ► Easily stuck in local minima ► RAND and PC Model ► Probabilistic techniques ► Depends on distribution of good points ► Not susceptible to local minima
CISC Machine Learning for Solving Systems Problems Static vs Dynamic Features
CISC Machine Learning for Solving Systems Problems ► Using machine learning successful ► Out-performs production-quality compiler ► Using performance counters ► Determines automatically important characteristics ► Optimizations applied only when beneficial Conclusions
CISC Machine Learning for Solving Systems Problems ► Use performance counters to predict “how” and “when” to apply an optimization ► Individual Opts: E.g., how many times to unroll a loop? ► Optimization sequences: Which opts to apply? ► Malware identification ► Can malware be identified by performance counter characteristics? Example Projects