Whole-Program Linear-Constant Analysis with Applications to Link-Time Optimization Ludo Van Put – Dominique Chanet – Koen De Bosschere Ghent University.

Slides:

Advertisements

Similar presentations

CSC 4181 Compiler Construction Code Generation & Optimization.

Advertisements

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.

Register allocation Morgensen, Torben. "Register Allocation." Basics of Compiler Design. pp from (

Instruction Set-Intro

Intermediate Code Generation

The University of Adelaide, School of Computer Science

Lecture 11: Code Optimization CS 540 George Mason University.

1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.

COMPILERS Register Allocation hussein suleman uct csc305w 2004.

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.

Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.

1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.

Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.

The Functions and Purposes of Translators Code Generation (Intermediate Code, Optimisation, Final Code), Linkers & Loaders.

6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.

August Code Compaction for UniCore on Link-Time Optimization Platform Zhang Jiyu Compilation Toolchain Group MPRC.

Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.

1 Storage Registers vs. memory Access to registers is much faster than access to memory Goal: store as much data as possible in registers Limitations/considerations:

Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.

1 Handling nested procedures Method 1 : static (access) links –Reference to the frame of the lexically enclosing procedure –Static chains of such links.

Fast Effective Dynamic Compilation Joel Auslander, Mathai Philipose, Craig Chambers, etc. PLDI’96 Department of Computer Science and Engineering Univ.

Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.

1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.

Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.

4/23/09Prof. Hilfinger CS 164 Lecture 261 IL for Arrays & Local Optimizations Lecture 26 (Adapted from notes by R. Bodik and G. Necula)

Code Generation Professor Yihjia Tsai Tamkang University.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.

Semantics of Calls and Returns

The environment of the computation Declarations introduce names that denote entities. At execution-time, entities are bound to values or to locations:

Improving Code Generation Honors Compilers April 16 th 2002.

Introduction to Optimization Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

ELEC2041 lec18-pointers.1 Saeid Nooshabadi COMP3221/9221 Microprocessors and Interfacing Lectures 7 : Pointers & Arrays in C/ Assembly

Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.

Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.

Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.

PSUCS322 HM 1 Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 10, 10/30/2003 Prof. Roy Levow.

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Register Allocation John Cavazos University.

CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.

1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.

Predicated Static Single Assignment (PSSA) Presented by AbdulAziz Al-Shammari

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Static Single Assignment John Cavazos.

Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.

Microprocessors The ia32 User Instruction Set Jan 31st, 2002.

CS412/413 Introduction to Compilers and Translators April 14, 1999 Lecture 29: Linking and loading.

CSC 8505 Compiler Construction Runtime Environments.

CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 11: Functions and stack frames.

Compacting ARM binaries with the Diablo framework – Dominique Chanet & Ludo Van Put Compacting ARM binaries with the Diablo framework Dominique Chanet.

Lecture 3 Translation.

Writing Functions in Assembly

Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.

Introduction to Optimization

Compilers Principles, Techniques, & Tools Taught by Jing Zhang

System Programming and administration

6.001 SICP Compilation Context: special purpose vs. universal machines

Writing Functions in Assembly

Chapter 9 :: Subroutines and Control Abstraction

Topic 10: Dataflow Analysis

Global optimizations.

Introduction to Optimization

Code Generation Part I Chapter 9

Introduction to Optimization

Optimizing Compilers CISC 673 Spring 2011 Static Single Assignment II

Compiler Construction

CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019

Live Variables – Basic Block

Presentation transcript:

Whole-Program Linear-Constant Analysis with Applications to Link-Time Optimization Ludo Van Put – Dominique Chanet – Koen De Bosschere Ghent University

Ludo Van Put - Ghent University - SCOPES Link-time optimization compiler linker executable program object code libraries link-time optimizer modified program

Ludo Van Put - Ghent University - SCOPES Link-time optimization executable program modified program link-time binary rewriting Interesting for embedded systems, e.g. ARM [De Bus et al., 2004]: Reduce code size (~15%) Improve performance (~10%) Reduce energy consumption (~8%) Can we go even further?

Ludo Van Put - Ghent University - SCOPES Outline motivation theory –linear-constant equations & analysis practice –data structure & operations experience –ARM optimizations conclusion

Ludo Van Put - Ghent University - SCOPES We could do better but… memory & stack: black box low level code: no compile time information need more (complex) analyses to further exploit whole- program overview

Ludo Van Put - Ghent University - SCOPES Propagate linear expressions machine instructions introduce simple relations between registers: e.g. ADD r0,r4,#4 → r0 - r4 + 4 = 0 constant information we can propagate through the graph and use it not a new idea [Karr, 1976], but a different context: huge graph, simple instructions, fixed number of registers, explicit memory accesses & addresses →less general relations and less complex computations

Ludo Van Put - Ghent University - SCOPES Dataflow analysis program statement data set A data set A’ Domain: sets of linear-constant equations, partially ordered under set inclusion of their ‘closure’, {r0 - r1+ a = 0}  {r0 - r1 + a = 0, r1 - r2 + b = 0} + operator ‘intersection of closures’ : semi-lattice Transfer function: transforms the data set, reflects program semantics Program instructions invalidate and create linear- constant equations.

Ludo Van Put - Ghent University - SCOPES Linear-constant equations Here: y = x + c, y and x registers, c integer constant Combining linear-constant equations: x 1 – y 1 + c 1 = 0 + x 2 – y 2 + c 2 = 0 then x 1 = y 2 or y 1 = x 2 Closure of set: all combinations Restriction: underdetermined or exactly determined sets –by construction in straight-line code –by intersection of closure at merge points

Ludo Van Put - Ghent University - SCOPES Data set representation assign registers unique number: r 0 …r n-1 add virtual zero-valued register, r  : – MOV r0, #8 → r 0 - r  - 8 = 0 –constant propagation normalize set of equations (r x – r y + c = 0) r 0 - r  - 8 = 0 r 1 - r  - 4 = 0 r 3 – r 2 – 12 = 0 r 0 – r = 0 r 1 - r  - 4 = 0 r 2 – r = 0 ≠≠ x < y

Ludo Van Put - Ghent University - SCOPES What about addresses? Graph representation: addresses are meaningless, symbolic references instead: ADDR r2, $reference Many addressproducers at link-time: optimization opportunity Extend linear-constant equations: r x – r y + c – addr x + addr y = 0 Combination requirement: addr x1 = addr y2 or addr y1 = addr x2 r 0 - r  - 8 = 0 r 1 - r  - 4 – ref 1 = 0 r 3 – r 2 – 12 = 0 r 0 – r ref 1 = 0 r 1 - r  - 4 – ref 1 = 0 r 2 – r = 0

Ludo Van Put - Ghent University - SCOPES Compact, fast data structure at most n equations: fixed-size array 0: r 0 – r ref 1 = 0 1: r 1 - r  - 4 – ref 1 = 0 2: r 2 – r = 0 3: Redundant! Reuse it for speedup 0: r 0 r 1 -4 ref 1 1: r 0 r  -4 ref 1 2: r 2 r : r 2

Ludo Van Put - Ghent University - SCOPES Operations lookup: index, O(1) remove register: index, combination, O(1) insert equation –r x = r y : change constant c, O(1) –r x ≠ r y : combine until normalized, O(n) meet operation: compute closures, intersect closures and normalize resulting set, each O(n 2 )

Ludo Van Put - Ghent University - SCOPES Example applications copy elimination, remove redundant code stack analysis & dead spill code elimination

Ludo Van Put - Ghent University - SCOPES Remove redundant code

Ludo Van Put - Ghent University - SCOPES Dead spill code ARM spill code: multiple-load & multiple-stores, e.g. STMDB sp!,{r4,r5,r6,r7,ra} saves 5 registers on the stack What if (one of) these registers is dead? Can we remove it from the list? Arguments are read using explicit offsets argument spilled registers local registers sp argument spilled registers local registers sp

Ludo Van Put - Ghent University - SCOPES Stack analysis at start of a procedure: {r sp - r  - ref sp = 0} propagate through procedure, assuming: –function calls restore r sp –spilled registers & local variables cannot be accessed by callee mark instructions that use r sp + offset remove spill code & adjust offsets where needed proc entry LDR r2,[sp,#32] proc exit … … r sp - r  - ref sp = 0 ?

Ludo Van Put - Ghent University - SCOPES Compaction & power savings ARM ADS 1.1 toolchain ‘–Os’, benchmarks from De Bus et al., 2004 without lcawith lca & optimizations % of original

Ludo Van Put - Ghent University - SCOPES Conclusion fast and practical link-time analysis enabler for new analyses and optimizations further reduction in code size & energy consumption program understanding, stack analysis

Ludo Van Put - Ghent University - SCOPES

Ludo Van Put - Ghent University - SCOPES Compaction & power savings D: Diablo without linear-constant analysis D+CA: Diablo with linear-constant analysis Toolchain: ARM ADS1.1

Ludo Van Put - Ghent University - SCOPES x86 argument forwarding x86 has few general purpose registers many constants x86 passes arguments via the stack: limitation for constant propagation stack analysis detects argument definitions & uses push %ebp mov %esp,%ebp mov 0x8(%ebp),%eax … push $0x4 call …

Ludo Van Put - Ghent University - SCOPES Results