Optimizing General Compiler Optimization M. Haneda, P.M.W. Knijnenburg, and H.A.G. Wijshoff.

Slides:



Advertisements
Similar presentations
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Advertisements

Copyright 2004 David J. Lilja1 Comparing Two Alternatives Use confidence intervals for Before-and-after comparisons Noncorresponding measurements.
AlphaZ: A System for Design Space Exploration in the Polyhedral Model
Fractional Factorial Designs of Experiments
Experimental Design, Response Surface Analysis, and Optimization
Computational Methods for Management and Economics Carla Gomes Module 8b The transportation simplex method.
Discussion #33 Adjacency Matrices. Topics Adjacency matrix for a directed graph Reachability Algorithmic Complexity and Correctness –Big Oh –Proofs of.
Search Techniques MSc AI module. Search In order to build a system to solve a problem we need to: Define and analyse the problem Acquire the knowledge.
Interactions - factorial designs. A typical application Synthesis catalysttemperature Yield of product Yield=f (catalyst, temperature) Is there an optimal.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)
Using process knowledge to identify uncontrolled variables and control variables as inputs for Process Improvement 1.
A New Biclustering Algorithm for Analyzing Biological Data Prashant Paymal Advisor: Dr. Hesham Ali.
Zhelong Pan [1] This presentation as.pptx: (or scan QR code) The paper: [1]
Introduction to Analysis of Algorithms
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and.
CSC401 – Analysis of Algorithms Lecture Notes 12 Dynamic Programming
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #16 3/1/02 Taguchi’s Orthogonal Arrays.
UNC Chapel Hill Lin/Manocha/Foskey Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Analysis of Algorithms COMP171 Fall Analysis of Algorithms / Slide 2 Introduction * What is Algorithm? n a clearly specified set of simple instructions.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Issues with Data Mining
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
Coverage – “Systematic” Testing Chapter 20. Dividing the input space for failure search Testing requires selecting inputs to try on the program, but how.
Simplex method (algebraic interpretation)
Matrix Sparsification. Problem Statement Reduce the number of 1s in a matrix.
Analysis of Algorithms
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Yaomin Jin Design of Experiments Morris Method.
1 The General 2 k-p Fractional Factorial Design 2 k-1 = one-half fraction, 2 k-2 = one-quarter fraction, 2 k-3 = one-eighth fraction, …, 2 k-p = 1/ 2 p.
Analysis of Algorithms CSCI Previous Evaluations of Programs Correctness – does the algorithm do what it is supposed to do? Generality – does it.
MSE-415: B. Hawrylo Chapter 13 – Robust Design What is robust design/process/product?: A robust product (process) is one that performs as intended even.
Algorithms 2005 Ramesh Hariharan. Algebraic Methods.
Optimizing Pheromone Modification for Dynamic Ant Algorithms Ryan Ward TJHSST Computer Systems Lab 2006/2007 Testing To test the relative effectiveness.
Orthogonal arrays of strength 3 with full estimation capacities Eric D. Schoen; Man V.M. Nguyen.
Output Grouping Method Based on a Similarity of Boolean Functions Petr Fišer, Pavel Kubalík, Hana Kubátová Czech Technical University in Prague Department.
Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject to some constraints. (There may.
Dynamic Programming.  Decomposes a problem into a series of sub- problems  Builds up correct solutions to larger and larger sub- problems  Examples.
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
Dynamic Programming & Memoization. When to use? Problem has a recursive formulation Solutions are “ordered” –Earlier vs. later recursions.
Feature Selection Poonam Buch. 2 The Problem  The success of machine learning algorithms is usually dependent on the quality of data they operate on.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
Corresponding Clustering: An Approach to Cluster Multiple Related Spatial Datasets Vadeerat Rinsurongkawong and Christoph F. Eick Department of Computer.
Data Structures and Algorithms Instructor: Tesfaye Guta [M.Sc.] Haramaya University.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
Heuristic Search Planners. 2 USC INFORMATION SCIENCES INSTITUTE Planning as heuristic search Use standard search techniques, e.g. A*, best-first, hill-climbing.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Introduction to Algorithms: Brute-Force Algorithms.
Buffering Techniques Greg Stitt ECE Department University of Florida.
Hirophysics.com The Genetic Algorithm vs. Simulated Annealing Charles Barnes PHY 327.
Chap 10. Sensitivity Analysis
BackTracking CS255.
Analysis of Algorithms
Advanced Design and Analysis Techniques
Póth Miklós Polytechnical Engineering College, Subotica
Computer Science cpsc322, Lecture 14
CO 303 Algorithm Analysis And Design Quicksort
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Algorithms and Problem Solving
Memory System Performance Chapter 3
Implementation of Learning Systems
Clustering.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning
Presentation transcript:

Optimizing General Compiler Optimization M. Haneda, P.M.W. Knijnenburg, and H.A.G. Wijshoff

Problem: Optimizing optimizations A compiler usually has many optimization settings (e.g. peephole, delayed-branch, etc)  gcc 3.3 has 54 optimization options  gcc 4 has over 100 possible settings Very little is known about how these options affect each other Compiler writers typically include switches that bundle together many optimization options  gcc –O1, -O2, -O3

…but can we do better? It is possible to perform better than these predefined optimization settings, but doing so requires extensive knowledge of the code as well as the available optimization options How do we define one set of options that would work well with a large variety of programs?

Motivation for this paper Since there are too many optimization settings, an exhaustive search would cost too much  gcc 3: 2^50 different combinations! We want to define a systematic method to find the optimal settings, with a reduced search space Ideally, we would like to do this with minimal knowledge of what the options actually will do

Big Idea We want to find the biggest subsets of compiler options that positively interact with each other Once we obtain these subsets, we will try to combine them together, under the condition that they do not negatively affect each other We will select our ultimate optimal compiler setting from the result of these set combinations

Full vs. Fractional Factorial Design Full Factorial Design: explores the entire search space, with every possible combination  Given k options, this will take O(2^k) time Fractional Factorial Design: explores a reduced search space, that is representative of the full search space  This can be done using orthogonal arrays

Orthogonal Arrays An Orthogonal Array is a matrix of 0’s and 1’s.  The rows represent the experiments to be performed.  The columns represent the factors that the experiment tries to analyze  Any option is equally likely to be turned on/off.  Given a particular experiment with a particular option turned on, all the other options are still equally likely to be turned on/off

Orthogonal Arrays

Algorithm – Step 1 Finding maximum subsets of positively interacting options  Step 1.1: Find a set of options that give the best overall improvement  For any single optimization setting i, compute the average speedup for all the settings in the search space in which i is turned on  Select M of the highest average improvement settings

Step 1.1: Selecting the initial set

Algorithm – Step 1(cont.) Step 1.2: Iteratively add new options to the already obtained sets, to get a maximum set of positively reinforcing optimizations  Ex: If using options A and B together produces a more optimal setting than just using A, then add B  If using {A, B} and C together produces a more optimal setting than {A, B}, then add C to {A, B}

Algorithm – Step 2 Take the sets that we already have and try to combine them together, assuming that they do not negatively influence each other. This is done to maximize the number of settings turned on for each set Example:  If {A, B, C} and {D, E} do not counteract each other, then we can combine them into {A, B, C, D, E}  Otherwise, leave them separate

Algorithm – Step 3 Take the resulting sets from step 2, and select the one with the best overall improvement. The result would be the ideal combination of optimization settings, according to this methodology.

Final Result

Comparing with existing compiler switches

Comparing results The compiler setting obtained by this methodology outpeforms –O1, -O2, and –O3 on almost all the SPECint95 benchmarks  -O3 performs better on li (39.2% vs. 38.4%)  The new setting delivers the best performance for perl (18.4% vs. 10.5%)

Conclusion The paper introduced a systematic way of combining compiler optimization settings  Used a reduced search space, constructed as an orthogonal array  Can be done with no knowledge of actual options  Can be done independently of architecture  Can be applied to a wide variety of applications

Future work Using the same methodology to find a good optimization setting for a particular domain of applications Applying the methodology to newer versions of the gcc compiler, such as gcc 4.0.1