Efficiently Exploring Compiler Optimization Sequences With Pairwise Pruning Milind Chabbi John Mellor-Crummey Keith Cooper RICE UNIVERSITY DEPARTMENT OF.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
An Evaluation of MC/DC Coverage for Pair-wise Test Cases By David Anderson Software Testing Research Group (STRG)
Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)
AI – Week 5 Implementing your own AI Planner in Prolog – part II : HEURISTICS Lee McCluskey, room 2/09
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Chapter 3 The Greedy Method 3.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Instruction Scheduling Using Max-Min Ant System Optimization Gang Wang, Wenrui Gong, and Ryan Kastner Dept. of Electrical and Computer Engineering University.
Derivation of Monotonic Covers for Standard C Implementation Using STG Unfoldings Victor Khomenko.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Linear Scan Register Allocation POLETTO ET AL. PRESENTED BY MUHAMMAD HUZAIFA (MOST) SLIDES BORROWED FROM CHRISTOPHER TUTTLE 1.
Metaheuristics The idea: search the solution space directly. No math models, only a set of algorithmic steps, iterative method. Find a feasible solution.
1 Prediction of Software Reliability Using Neural Network and Fuzzy Logic Professor David Rine Seminar Notes.
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
Flow Models and Optimal Routing. How can we evaluate the performance of a routing algorithm –quantify how well they do –use arrival rates at nodes and.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Graph Partitioning Problem Kernighan and Lin Algorithm
Relaxation and Hybrid constraint processing Different relaxation techniques Some popular hybrid techniques.
Evaluation of Alternative Methods for Identifying High Collision Concentration Locations Raghavan Srinivasan 1 Craig Lyon 2 Bhagwant Persaud 2 Carol Martell.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Search Methods An Annotated Overview Edward Tsang.
Télécom 2A – Algo Complexity (1) Time Complexity and the divide and conquer strategy Or : how to measure algorithm run-time And : design efficient algorithms.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Greedy Algorithms and Matroids Andreas Klappenecker.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 18, 2015 Architecture Synthesis (Provisioning, Allocation)
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
CAS 721 Course Project Implementing Branch and Bound, and Tabu search for combinatorial computing problem By Ho Fai Ko ( )
“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.
Using Cache Models and Empirical Search in Automatic Tuning of Applications Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX Apan.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
Heuristics for Efficient SAT Solving As implemented in GRASP, Chaff and GSAT.
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Tommy Messelis * Stefaan Haspeslagh Burak Bilgin Patrick De Causmaecker Greet Vanden Berghe *
MA/CSSE 473 Day 14 Strassen's Algorithm: Matrix Multiplication Decrease and Conquer DFS.
Approximation Algorithms based on linear programming.
LECTURE 11: LINEAR MODEL SELECTION PT. 1 March SDS 293 Machine Learning.
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
Advanced Algorithms Analysis and Design
Semi-Supervised Clustering
Greedy Technique.
RE-Tree: An Efficient Index Structure for Regular Expressions
White-Box Testing.
Artificial Intelligence Problem solving by searching CSC 361
Presented by: Sameer Kulkarni
CSCI1600: Embedded and Real Time Software
Finding Heuristics Using Abstraction
CSCE350 Algorithms and Data Structure
Objective of This Course
White-Box Testing.
CS 201 Compiler Construction
Lecture 6 Efficiency of Algorithms (2) (S&G, ch.3)
Chapter 6: Transform and Conquer
Algorithms for Budget-Constrained Survivable Topology Design
3. Brute Force Selection sort Brute-Force string matching
3. Brute Force Selection sort Brute-Force string matching
CSCI1600: Embedded and Real Time Software
3. Brute Force Selection sort Brute-Force string matching
Presentation transcript:

Efficiently Exploring Compiler Optimization Sequences With Pairwise Pruning Milind Chabbi John Mellor-Crummey Keith Cooper RICE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE 1 This work is funded by the Defense Advanced Research Projects Agency (DARPA) through the Air Force Research Lab (AFRL).

Compiler Optimization Phase-Ordering Problem  Order of application of compiler optimizations drastically changes measured performance  Kulkarni et al. [CGO’ 06] show 38% average code size reduction  Zhao et al. [CGO’09] show up to 32% speedup  Production compilers still use fixed order Figure credit : Zhao et al. [CGO’09] Exascale systems multiply the cost of poor node performance 2

Phase-Order Selection Is Hard  Selecting best phase order is non-trivial  Program dependent  Relations between optimizations are complex One optimization can enable/disable another  Exhaustive empirical exploration is expensive and unrealistic  20 Optimization  2.5 * possible optimization sequences  “Exhaustive optimization phase order space exploration.” [Kulkarni et al. CGO '06] Many optimization orders lead to structurally same function instances  Approaches  Analytically modeling code and effects of optimization is non-trivial and still in infancy “M. L. A framework for exploring optimization properties.” [Zhao et al. CC '09]  Other techniques have been tried and proven to be effective Genetic algorithms [Cooper et al. SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems 1999] 3

Roadmap  Phase order selection using pairwise constraints between optimizations  Graph model  Regression model  Conditional Sampling model Will show effectiveness on sample numerical program FMIN throughout the discussion with dynamic instruction count (DIC) as our optimization metric 4

Interaction Is Significant Between Pairs  Interaction is significant between pairs  Capture the ordering of pairs without regard to their absolute positions 5 a a b b a a b b b b a a b b a a Good Bad

Pruning Using Pairwise Constraints 6 a a b b b b a a a a b b

Background And Effectiveness Of Pairwise Pruning  Used by test community  In software testing : multiple input variables taking multiple values cause combinatorial explosion  Pairwise (a.k.a. all-pairs) testing is based on the observation that most faults are caused by interactions of at most two factors.  Pairwise-generated test suites cover all combinations of two therefore are much smaller than exhaustive ones yet still very effective in finding defects 7 K. Burr and W. Young [STAR’98] D. R. Wallace and D. R. Kuhn [International Journal of Reliability, Quality and Safety Engineering,2001]

Roadmap  Phase order selection using pairwise constraints between optimizations  Graph model  Regression model  Conditional Sampling model 8

Graph Model  Nodes represent optimizations : E.g. { a, b, c}  Directed edges represent optimization orders  Graph construction  Empirically evaluate all pairs to add edges ab < ba  edge (a,b) ac < ca  edge (a,c) cb < bc  edge (c,b)  Add weights to edges based on profitability E.g. (ab) Vs. (ba) has profit of 20% a b c Graph may be cyclic or acyclic 9

Phase Order Selection For Acyclic Graphs  Topologically sort graph nodes to get a sequence  Such sequence (if exists) maintains all pair- ordering constrains a b c Model found best sequence 10

Phase Order Selection For Graphs With Cycles  Cyclic ordering constraints:  ab < ba  edge (a,b)  bc < cb  edge (b,c)  ca < ac  edge (c,a)  Select an edge to break in each cycle  Select edge to minimize total weight of deleted edges (minimizes cost of pair-ordering constraint violation)  E.g. break edge (c,a)  Optimal sequence is : abc a b c

Graph Model On FMIN 12

Performance Estimation  Want to predict performance of any random sequence  Useful to ensure that a given sequence optimized for one objective function does not dramatically worsen another objective  E.g. Speed vs. Code size  Provides an analytical model for performance prediction 13

Graph Model For Performance Estimation  Graph model has built-in ability to estimate performance of a given sequence  To estimate the performance of a random sequence:  Perform a walk on the graph using the given sequence  Add weights of violated ordering-preference along the walk to the performance number of the model found best sequence (already known) 14

Example Graph Model For Performance Estimation  Let observed performance of model found best sequence (abcd) be 1200 instructions  Estimated performance of sequence dacb is: a b c d = 1340 Edges decorated with absolute difference not relative % Edges decorated with absolute difference not relative % 15 d a c b

Performance Estimation With Graph Model On FMIN  6 optimizations i.e. 720 sequences  Divergence + Phase mismatch 16

Issues With Graph Model  Considered just pairs of optimizations of length 2  Neglected global behavior of optimizations  Assumed weights or behaviors of pairs to be context-insensitive (i.e. same even in full length sequence)  Want a model that is context-sensitive 17

Roadmap  Phase order selection using pairwise constraints between optimizations  Graph model  Regression model  Conditional Sampling model 18

Getting Context Sensitive With Regression Model  Take into account context of the pairs by sampling full-length sequences  Represent sequences by regression equations  Represent all possible pairs as a parameter vector  Presence / absence of pairs in a sequence as input variables  Observed performance of a sequence as measured value X = Input variables Parameter vector Measured value 19

Example Linear Regression Model  Optimizations : { a, b, c }  Sequence :  Equation : abc X ab X ba X ac X ca X bc X cb X abc X bac Measured value … … … … 20 Parameter vector

Analytical Model For A Sequence cba 21

Regression Model On FMIN  Sequence of length 6  6! = 720 total sequences No phase mismatch, less divergence 22

Analysis of Regression-equation: Optimization Grouping Effect  Sequence of length 6  6! = 720 total sequences gn,ln,mn lg, lm 23

Refined Regression Model 100% sampling to solve regression equation 24 Superior projections, perfect corelation

Regression Model With Reduced Sampling Rate 12% sampling 25

Roadmap  Phase order selection using pairwise constraints between optimizations  Graph model  Regression model  Conditional Sampling model 26

Properties of Pairs Across Phase Shifts (m,n) = 0% (m,n) = 66.6% (l,n) = 0% (l,n) = 66.6% (g,n) = 0% (g,n) = 66.6% 27

Properties of Pairs Across Phase Shifts (l,g) = 0% (l,g) = 75% (l,g) = 0% (l,g) = 75% mn,ln,gn shift (l,m) = 0% (l,m) = 75% (l,m) = 0% (l,m) = 75% 28

Properties of Pairs Across Phase Shifts mn,ln,gn shift lm, lg shift (c,d) = 0% (c,d) = 100% 0% 100% 0% 100 % 0% 100 % 29

Conditional Sampling Model  Sample k << n! full length sequences that satisfy a set of pairwise ordering constraints C  Initially C = {}  We sampled 100 sequences in our implementation  Identify largest phase shift  Obtain pattern on either side of largest phase shift  e.g. pairs present with 100% or 0% on one side  Add pairwise constrains favoring better performance to C  Repeat sampling and refining C until we reach a performance plateau 30

Conditional Sampling On FMIN Conditions: (o,d) = 100% (o,d) = 17% od 13 optimization : {a, b, c, d, g, l, m, n, o, q, t, v, z} 31

Conditional Sampling On FMIN Conditions: od 13 optimization : {a, b, c, d, g, l, m, n, o, q, t, v, z} vd (v,d) = 100% (v,d) = 60% 32

Conditional Sampling On FMIN an,oa,bn,cn,dn,gn, ln,ol, mn,on, qn, tn, vn, zn, oq, ov Conditions: od vd an, oa, bn, cn, dn, gn, ln, ol, mn, on, qn, tn, vn, zn, oq, ov = 100% an, oa, bn, cn, dn, gn, ln, ol, mn, on, qn, tn, vn, zn, oq, ov = 100% an = 39% cn = 39% dn = 43% gn = 37% ln = 37% ol = 79% mn = 40% on = 71% qn = 37% oq = 79% ov = 100% tn = 37% oa = 80% bn = 46% vn = 13% zn = 61% 33

Conditional Sampling On FMIN cd, cv (c,d) = 100% (c,d) = 0% (c,v) = 100% (c,v) = 0% 13 optimization : {a, b, c, d, g, l, m, n, o, q, t, v, z} an,oa,bn,cn,dn,gn, ln,ol, mn,on, qn, tn, vn, zn, oq, ov Conditions: od vd 34

Conditional Sampling On FMIN Required 500 samples i.e. 8 * % sampling Required 500 samples i.e. 8 * % sampling cd, cv an,oa,bn,cn,dn,gn, ln,ol, mn,on, qn, tn, vn, zn, oq, ov Conditions: od vd 35

Summary  Order of application of compiler optimizations has dramatic effect on performance  “Pairwise pruning” reduces empirical search space by several orders of magnitude, yet effective  Three models of pairwise pruning  Context insensitive graph model  Context sensitive regression model  Context sensitive Conditional Sampling model  Initial results are encouraging  Technique can be used to augment other search space pruning techniques 36

Backup slides 37

Challenges And Opportunities  Not a silver bullet strategy  Sometimes patterns may not be as distinct as 0% or 100%, we may have to choose pattern based on higher percentage on one side E.g. 90% on left vs. 30% on right  In our experiments we always took 100 samples, we can tune it with various techniques  Vuduc et al. [International Journal of High Performance Computing Applications ] suggest a statistical early stopping criterion which suggests when sampling can be stopped 38

Graph Model On FMIN  Six optimizations : {c,d,g,l,m,n}  Model found optimal sequence : cndgml  Model found sequence had dynamic instruction count of 1221 which was best among entire 720 possible sequences 39