Automatic Feature Generation for Setting Compiler Heuristics Hugh Leather*, Elad Yom-Tov, Mircea Namolaru, Ari Freund** *Institute for Computing Systems.

Slides:

Advertisements

Similar presentations

CSC 4181 Compiler Construction Code Generation & Optimization.

Advertisements

Models of Computation Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms Week 1, Lecture 2.

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.

8. Static Single Assignment Form Marcus Denker. © Marcus Denker SSA Roadmap  Static Single Assignment Form (SSA)  Converting to SSA Form  Examples.

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.

Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.

ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto

Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.

C Chuen-Liang Chen, NTUCS&IE / 321 OPTIMIZATION Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University.

Code Generation Steve Johnson. May 23, 2005Copyright (c) Stephen C. Johnson The Problem Given an expression tree and a machine architecture, generate.

Optimal Instruction Scheduling for Multi-Issue Processors using Constraint Programming Abid M. Malik and Peter van Beek David R. Cheriton School of Computer.

Program Representations. Representing programs Goals.

Common Sub-expression Elim Want to compute when an expression is available in a var Domain:

Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.

Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.

CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.

4/23/09Prof. Hilfinger CS 164 Lecture 261 IL for Arrays & Local Optimizations Lecture 26 (Adapted from notes by R. Bodik and G. Necula)

CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.

Hardware-Software Interface Machine Program Performance = t cyc x CPI x code size X Available resources statically fixed Designed to support wide variety.

Intermediate Code. Local Optimizations

Improving Code Generation Honors Compilers April 16 th 2002.

Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.

CISC673 – Optimizing Compilers1/34 Presented by: Sameer Kulkarni Dept of Computer & Information Sciences University of Delaware Phase Ordering.

Precision Going back to constant prop, in what cases would we lose precision?

2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.

Data Structures & AlgorithmsIT 0501 Algorithm Analysis I.

CHAPTER 5: CONTROL STRUCTURES II INSTRUCTOR: MOHAMMAD MOJADDAM.

6-1 Chapter 6 - Languages and the Machine Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring Computer.

Coverage Criteria for Testing of Object Interactions in Sequence Diagrams Atanas (Nasko) Rountev Scott Kagan Jason Sawin Ohio State University.

Computer Science 313 – Advanced Programming Topics.

Chapter 5: Control Structures II (Repetition). Objectives In this chapter, you will: – Learn about repetition (looping) control structures – Learn how.

1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”

Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,

CISC Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware

Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.

Program Representations. Representing programs Goals.

PLC '06 Experience in Testing Compiler Optimizers Using Comparison Checking Masataka Sassa and Daijiro Sudo Dept. of Mathematical and Computing Sciences.

Cleaning up the CFG Eliminating useless nodes & edges This lecture describes the algorithm Clean, presented in Chapter 10 of EaC2e. The algorithm is due.

CISC Machine Learning for Solving Systems Problems Presented by: Eunjung Park Dept of Computer & Information Sciences University of Delaware Solutions.

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Intelligent Compilation John Cavazos Computer & Information Sciences Department.

Copyright © Curt Hill Other Trees Applications of the Tree Structure.

Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.

Machine Learning, Compilers and Mobile Systems Institute for Computing Systems Architecture University of Edinburgh, UK Hugh Leather.

Automatic Feature Generation for Machine Learning Based Optimizing Compilation Hugh Leather, Edwin Bonilla, Michael O'Boyle Institute for Computing Systems.

Raced Profiles:Efficient Selection of Competing Compiler Optimizations Hugh Leather, Bruce Worton, Michael O'Boyle Institute for Computing Systems Architecture.

MILEPOST Machine learning in compilers: The Future of Optimisation Hugh Leather University of Edinburgh.

Compiler Design (40-414) Main Text Book:

Advanced Architectures

Control Flow Testing Handouts

A Simple Syntax-Directed Translator

Analysis of Algorithms

Optimizing Compilers Background

Static Single Assignment

Ch. 4 – Semantic Analysis Errors can arise in syntax, static semantics, dynamic semantics Some PL features are impossible or infeasible to specify in grammar.

Dept of Computer Science

Lecture 2 Introduction to Programming

Outline of the Chapter Basic Idea Outline of Control Flow Testing

A Closer Look at Instruction Set Architectures

Optimizing Transformations Hal Perkins Autumn 2011

Register Pressure Guided Unroll-and-Jam

Predicting Unroll Factors Using Supervised Classification

Chapter 12 Pipelining and RISC

CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019

COMP755 Advanced Operating Systems

CSc 453 Final Code Generation

Rohan Yadav and Charles Yuan (rohany) (chenhuiy)

Presentation transcript:

Automatic Feature Generation for Setting Compiler Heuristics Hugh Leather*, Elad Yom-Tov**, Mircea Namolaru**, Ari Freund** *Institute for Computing Systems Architecture, University of Edinburgh, UK **IBM Haifa Research Lab, Israel

Overview Simplified view of ML in compilers Input projection problem Features for a toy compiler Features for GCC Results Further and on going work

Overview Simplified view of ML in compilers Input projection problem Features for a toy compiler Features for GCC Results Further and on going work

Simplified view of ML in compilers Goal: Replace an heuristic with a Machine Learned one For example, learn better loop unrolling for GCC

Simplified view of ML in compilers Start with compiler data structures AST, RTL, SSA, CFG, DDG, etc. Human expert determines a mapping to a feature vector number of instructions mean dependency depth branch count loop nest level trip count...

Simplified view of ML in compilers Now collect many examples of programs, determining their feature values... Execute the programs with different compilation strategies and find the best for each

Simplified view of ML in compilers Now give these examples to a machine learner It learns a model... Supervised Machine Learner Example s Desired Outputs Feature s Model

Simplified view of ML in compilers This model can then be used to predict the best compiler strategy from the features of a new program Our heuristic is replaced Desired Outputs Feature s Model

Overview Simplified view of ML in compilers Input projection problem Features for a toy compiler Features for GCC Results Further and on going work

Input projection problem The expert must do a good job of projecting down to features amount of white space average identifier length age of programmer name of program...

Input projection problem Earlier work 1 showed that both positive and negative classification examples can receive the same feature values 1 Rene Quiniou Antoine Monsifrot, Francois Bodin. A machine learning approach to automatic production of compiler heuristics, Feature 1 Feature Machine learning works well when all examples associated with one feature value have the same type Machine learning doesn't work if the features don't distinguish the examples

Input projection problem Better features might allow classification Feature 1 Feature Feature 1 Feature Feature 3 = 0Feature 3 = 1

Input projection problem But, there is an existence proof of better features Execute the program and decide the strategy Strategy becomes the feature (ML is identity) Not very practical

Input projection problem There are much more subtle interactions between features and induction algorithm There is an infinite number of possible features Suppose the expert thinks of feature: Number of addition nodes Perhaps this is better: Number of addition nodes whose left child is a constant

Overview Simplified view of ML in compilers Input projection problem Features for a toy compiler Features for GCC Results Further and on going work

A feature language for a toy compiler Toy language the compiler accepts: Variables, integers, '+', '*', parentheses Examples: a = 10 b = 20 c = a * b + 12 d = a * (( b + c * c ) * ( ))

A feature language for a toy compiler What type of features might we want? count−nodes−matching( is−times && left−child−matches( is−plus )&& right−child−matches( is−constant ) = var+ ** +const+ varconst*var +const var a = ((b+c)*2 + d) * 9 + (b+2)*4 Value = 3

A feature language for a toy compiler Define a simple feature language: ::= ”count−nodes−matching(” ”)” ::= ”is−constant” | ”is−variable” | ”is−any−type” | ( ”is−plus” | ”is−times” ) ( ”&& left−child−matches(” ”)” ) ? ( ”&& right−child−matches(” ”)” ) ?

A feature language for a toy compiler Now generate sentences from the grammar to give features Grammar ::= | “b” Sentence Start with the root non-terminal AAAA Choose randomly among productions and replace Repeat for each non-terminal still in the sentence bbbbb Continue until there are no more non-terminals bAAAb

A feature language for a toy compiler Be careful about non-termination and runaway expansion The sentence AAA has only 1/8 chance of containing fewer non-terminals after expansion Add probabilities to productions to fix runaways Grammar ::= | “b” Sentence AAA

A feature language for a toy compiler Grammars can be arbitrarily complex Produce sentences in any language Use 'semantic actions' and dynamic terminals Feature languages can easily include Conjunctive and disjunctive forms Tests for attribute values Follow graph structures etc., etc.

Overview Simplified view of ML in compilers Input projection problem Features for a toy compiler Features for GCC Results Further and on going work

Features for GCC Diagram of the tool chain GCC Exports Data Structures to XML Benchmarks Program Data Grammar Stylesheet Structure Analysed Grammar Created Sentence Generator Compiled Random Features Generated Feature Values Computed Structure GrammarSentence Generator Random Features Generated Feature Values as Matrix.xml.xsl GCC.c XSL Engine. bnf.class XQuery Engine.csv.xml.xql XSL + JavaC

Features for GCC Start by dumping data structures to XML GCC Exports Data Structures to XML Benchmarks Program Data Grammar Stylesheet Structure Analysed Grammar Created Sentence Generator Compiled Random Features Generated Feature Values Computed Structure GrammarSentence Generator Random Features Generated Feature Values as Matrix.xml.xsl GCC.c XSL Engine. bnf.class XQuery Engine.csv.xml.xql XSL + JavaC

Features for GCC Start by dumping data structures to XML

Features for GCC Find out the structure found in the benchmarks Allows system to know data format without hard coding GCC Exports Data Structures to XML Benchmarks Program Data Grammar Stylesheet Structure Analysed Grammar Created Sentence Generator Compiled Random Features Generated Feature Values Computed Structure GrammarSentence Generator Random Features Generated Feature Values as Matrix.xml.xsl GCC.c XSL Engine. bnf.class XQuery Engine.csv.xml.xql XSL + JavaC

Features for GCC Grammar is constructed from structure GCC Exports Data Structures to XML Benchmarks Program Data Grammar Stylesheet Structure Analysed Grammar Created Sentence Generator Compiled Random Features Generated Feature Values Computed Structure GrammarSentence Generator Random Features Generated Feature Values as Matrix.xml.xsl GCC.c XSL Engine. bnf.class XQuery Engine.csv.xml.xql XSL + JavaC

Features for GCC Grammar is constructed from structure Creates a huge grammar Features are in Xquery; features like: // Count the number of ’label_expr’s followed by a ’cond_expr’ whose first // child was a ’neexpr’ sum(bb/stmts/count( ∗ [name()=”labelexpr”] [following−sibling:: ∗ [1][name()=”condexpr”] [ ∗ [1][name()=”neexpr”]]] )) // Take the average of the number of ’realcst ’nodes in each basic block avg(bb/stmts//count( ∗ [name()=”realcst”])) // Take the minimum number of predecessors across the basic blocks with the // ’fallthru’ flag set but not the ’exec’ flag min(bb/preds/count( ∗

Features for GCC Grammar compiled down to Java GCC Exports Data Structures to XML Benchmarks Program Data Grammar Stylesheet Structure Analysed Grammar Created Sentence Generator Compiled Random Features Generated Feature Values Computed Structure GrammarSentence Generator Random Features Generated Feature Values as Matrix.xml.xsl GCC.c XSL Engine. bnf.class XQuery Engine.csv.xml.xql XSL + JavaC

Features for GCC Thousands of random Xquery features generated GCC Exports Data Structures to XML Benchmarks Program Data Grammar Stylesheet Structure Analysed Grammar Created Sentence Generator Compiled Random Features Generated Feature Values Computed Structure GrammarSentence Generator Random Features Generated Feature Values as Matrix.xml.xsl GCC.c XSL Engine. bnf.class XQuery Engine.csv.xml.xql XSL + JavaC

Features for GCC Features are evaluated over benchmark data GCC Exports Data Structures to XML Benchmarks Program Data Grammar Stylesheet Structure Analysed Grammar Created Sentence Generator Compiled Random Features Generated Feature Values Computed Structure GrammarSentence Generator Random Features Generated Feature Values as Matrix.xml.xsl GCC.c XSL Engine. bnf.class XQuery Engine.csv.xml.xql XSL + JavaC

Overview Simplified view of ML in compilers Input projection problem Features for a toy compiler Features for GCC Results Further and on going work

Results - GCC loop unrolling Set up Modified GCC benchmarks from UTDSP and MediaBench Pentium 4; 2.8GHz; 512Mb RAM Benchmarks run in RamDisk to reduce IO variability Found best unroll factor for each loop in [0-16]

Results - GCC loop unrolling Loops grouped according to K-means method Average speed up (compared to GCC default) Unroll factor Group size Group

Results - GCC loop unrolling Decision tree (C4.5) maps loop to K-Means group If not in improved group then Unroll by GCC heuristic Else Unroll by 2 (best factor across improved group)

Results – GCC loop unrolling Top features of the decision tree Feature 1 : Count the number of ”plus” nodes whose first operand is a ”reg” node with both ”volatile” and ”frame−related” flags set count( descendant−or−self:: ∗ [name()=”plus”] [ ∗ Feature 2 : Count the number of ”set” nodes whose source operand is ”mult” nodes for which the first operand is a ”reg” node count( descendant−or−self:: ∗ [name()=”set”] [ ∗ [2][name()=”mult”] [ ∗ [1][name()=”reg”]]]) Feature 3 : Count ”compare” nodes whose first child is a ”reg” node which has its ”frame−related” flag set and has”mode=SI” count( descendant−or−self:: ∗ [name()=”compare”] [ ∗ Feature 4 : Count instructions that set a register to the value of a comparison against a register count( descendant−or−self:: ∗ [name()=”insn”] [ ∗ [6][name()=”set”] [ ∗ [1][name()=”reg”]] [ ∗ [2][name()=”compare”] [ ∗ [2][name()=”reg”]]]]) Feature 5 : Count the number of basic−blocks that might be hot and have estimated frequency less than They must also contain a call instruction whose last field is null. count( descendant−or−self:: ∗ [name()=”basic−block”] lt 5426] [ ∗ [name()=”call_insn”] [ ∗ [10][name()=”null”]]])

Results – GCC loop unrolling Average speedup 1.8% Best speedup 5.3% Expect better when more data (DDG, etc) made available

Results - GCC pass reordering Set up Modified GCC benchmarks from Spec Int 2000 PowerPC 5; 1.5GHz; 4-core Executed each benchmark with 1956 different orders of Constant propagation Partial redundancy elimination Dead code elimination Forward propagation Copy propagation Merge phi node

Results - GCC pass reordering Loops grouped according to K-means method Average speed up (compared to GCC default) Variation number Group size Group

Results - GCC pass reordering Looked at 326 functions with runtime > 0.1 sec Decision tree (C4.5) maps loop to K-Means group If not in improved group then Unroll by GCC default pass order Else Use maximal pass order

Results – GCC pass reordering Top features of the decision tree Feature 1 : Count the number of ”indirect_ref”s of ”pointer_type” anywhere in each basic block and then take the average across all basic blocks in the function. avg(bb/stmts//count( ∗ Feature 2 : Count ”cond_expr”s anywhere in each basic block whose condition is greater than expressions whose second operand is a constant, invariant address expression avg( bb/stmts//count( ∗ [name()=”cond_expr”] [ ∗ [ ∗ ] )

Results – GCC pass reordering Average speedup 4.0% Best speedup 36.4% Expect better when more data (DDG, etc) made available Expect better when predicting precise pass order, not just K-means group (needs more examples)

Overview Simplified view of ML in compilers Input projection problem Features for a toy compiler Features for GCC Results Further and on going work

Add more data – include DDG, CFG, etc. Enable searching over the grammar Explore graph structures

Questions? Thank you