Interprocedural Optimization — a much older version of the lecture — Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.

Slides:



Advertisements
Similar presentations
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Advertisements

Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
The Last Lecture Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission.
Lazy Code Motion Comp 512 Spring 2011
Lazy Code Motion C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Introduction to Code Optimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
Code Shape IV Procedure Calls & Dispatch Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
1 Handling nested procedures Method 1 : static (access) links –Reference to the frame of the lexically enclosing procedure –Static chains of such links.
Feedback: Keep, Quit, Start
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
The Procedure Abstraction Part III: Allocating Storage & Establishing Addressability Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all.
9. Optimization Marcus Denker. 2 © Marcus Denker Optimization Roadmap  Introduction  Optimizations in the Back-end  The Optimizer  SSA Optimizations.
Previous finals up on the web page use them as practice problems look at them early.
The Procedure Abstraction Part I: Basics Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Intermediate Code. Local Optimizations
Improving Code Generation Honors Compilers April 16 th 2002.
Introduction to Optimization Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Instruction Scheduling II: Beyond Basic Blocks Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.
Schedule Midterm out tomorrow, due by next Monday Final during finals week Project updates next week.
Interprocedural Analysis & Optimization 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Procedure Optimizations and Interprocedural Analysis Chapter 15, 19 Mooly Sagiv.
Dynamic Optimization as typified by the Dynamo System See “Dynamo: A Transparent Dynamic Optimization System”, V. Bala, E. Duesterwald, and S. Banerjia,
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
The Procedure Abstraction Part I: Basics Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Code Optimization, Part III Global Methods Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Introduction to Optimization, II Value Numbering & Larger Scopes Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
COP4020 Programming Languages Subroutines and Parameter Passing Prof. Xin Yuan.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Global Redundancy Elimination: Computing Available Expressions Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
Lexical analyzer Parser Semantic analyzer Intermediate-code generator Optimizer Code Generator Postpass optimizer String of characters String of tokens.
Algebraic Reassociation of Expressions Briggs & Cooper, “Effective Partial Redundancy Elimination,” Proceedings of the ACM SIGPLAN 1994 Conference on Programming.
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
The Procedure Abstraction, Part IV Addressability Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.
Interprocedural Analysis & Optimization C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights.
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)
Dead Code Elimination This lecture presents the algorithm Dead from EaC2e, Chapter 10. That algorithm derives, in turn, from Rob Shillner’s unpublished.
Boolean & Relational Values Control-flow Constructs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in.
Cleaning up the CFG Eliminating useless nodes & edges This lecture describes the algorithm Clean, presented in Chapter 10 of EaC2e. The algorithm is due.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
Introduction to Optimization
Optimization Code Optimization ©SoftMoore Consulting.
The Procedure Abstraction Part IV: Allocating Storage & Establishing Addressability Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights.
Machine-Independent Optimization
Introduction to Optimization
Intermediate Representations
Introduction to Code Generation
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
The Procedure Abstraction Part I: Basics
Intermediate Representations
The Last Lecture COMP 512 Rice University Houston, Texas Fall 2003
Code Shape IV Procedure Calls & Dispatch
Introduction to Optimization
Lecture 16: Register Allocation
Algebraic Reassociation of Expressions COMP 512 Rice University Houston, Texas Fall 2003 P. Briggs & K.D. Cooper, “Effective Partial Redundancy Elimination,”
Code Optimization.
Presentation transcript:

Interprocedural Optimization — a much older version of the lecture — Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. Comp 512 Spring COMP 512, Rice University

2 Enough Analysis, What Can We Do? There are interprocedural optimizations Choosing custom procedure linkages Interprocedural common subexpression elimination Interprocedural code motion Interprocedural register allocation Memo-function implementation Cross-jumping Procedure recognition & abstraction

COMP 512, Rice University3 Linkage Tailoring Should choose the best linkage for circumstances Inline, clone, semi-open, semi-closed, closed Estimate execution frequencies & improvements Assign styles to call sites  The choices interact

COMP 512, Rice University4 Linkage Tailoring Linkage styles Traditional closed linkage p q p q Open linkage (inlined ) Eliminate the call overhead Create tailored private copy of callee

COMP 512, Rice University5 Linkage Tailoring Linkage styles Procedure Cloning t sp q0q0 q1q1 r Partition call sites by environment Create a location where desired facts can be true

COMP 512, Rice University6 Linkage Tailoring Linkage styles Traditional closed linkage p q Semi-open linkage Move prolog & epilog across call & tailor them For call in loop, parts of linkage are loop invariant p q Think of this as an aggressive closed linkage

COMP 512, Rice University7 Linkage Tailoring Should choose the best linkage for circumstances Inline, clone, semi-open, semi-closed, closed Estimate execution frequencies & improvements Assign styles to call sites Practical approach Limit choices ( standard, cloned, inlined ) Clone for better information & to specialize ( based on idfa ) Inline for high-payoff optimizations Adopt an aggressive standard linkage  Move parameter addressing code out of callee ( & out of loop ) The choices interact

COMP 512, Rice University8 Linkage Tailoring A low-level, post-compilation idea Compute register summaries for each procedure  How many registers are really used? Rotate assignment to move them into caller-saves registers  Leave callee-saves registers unused, if possible  Avoid saving them in the callee Leaf routine with little demand for registers Adjust caller-saves/callee-saves boundary Remove caller’s stores for registers unused in callee Some authors call this interprocedural register allocation

COMP 512, Rice University9 Interprocedural Register Allocation Some work has been done Chow’s compiler for the M IPS machine, “shrink wrapping”  Often slowed down the code Wall and Goodwin at D EC S RC did link-time allocation  Target machine had scads of registers  Essentially, adjusted caller-saves & callee-saves What about full-blown, allocate the whole thing, allocation? Arithmetic of costs is pretty complex Requires good profile or frequency information Need a fair basis for comparing different uses for r i Real issue would be spilling ( “spill everywhere” would be a disaster )

COMP 512, Rice University10 Interprocedural Common Subexpression Elimination Sounds good, but consider the real opportunities Procedures only share parameters, globals, & constants No local variables in an I CSE, by definition Not a very large set of expressions Possible schemes Create a global data area to hold I CSE s  Sidestep register pressure issue Ellide unnecessary parameters or pass-through parameters  “commoning” in Cocke’s terminology  Shrink the pre-call sequence

COMP 512, Rice University11 Invocation Invariant Expressions Do iie’s exist? (yes) Is moving them profitable? (it should be) Can we engineer this into a compiler? (this is tougher) 8.9% to 0.2% 1.66% on average Static counts, not dynamic counts

COMP 512, Rice University12 Interprocedural Code Motion What could we do? What could we find? Find and mark loop nests in the call graph Compute interprocedural A VAIL ? Move code across procedure boundaries Two ideas Invocation invariant expressions  Expression whose value is determined at point of call  Hoist them to the prolog, or hoist them across the call Loop embedding and extraction  Move control-flow for entire nest into one procedure  Enable optimization using intra-procedural techniques } All are difficult

COMP 512, Rice University13 Interprocedural Code Motion Moving loops across a call do i = 1 to 100 do j = 1 to 100 call fee(a,b,c,i,j) end fee(x,y,z,i,j) do k = 1 to 100 x(i,j) = x(i,j) + y(i,k) * z(k,j) end call fee(a,b,c) fee(x,y,z) do i = 1 to 100 do j = 1 to 100 do k = 1 to 100 x(i,j) = x(i,j) + y(i,k) * z(k,j) end Loop Embedding Think of this as dual of partial inlining ( partial outlining?)

COMP 512, Rice University14 Interprocedural Code Motion Moving loops across a call do i = 1 to 100 do j = 1 to 100 call fee(a,b,c,i,j) end fee(x,y,z,i,j) do k = 1 to 100 x(i,j) = x(i,j) + y(i,k) * z(k,j) end do i = 1 to 100 do j = 1 to 100 do k = 1 to 100 call fee(a,b,c,i,j,k) end fee(x,y,z,i,j,k) x(i,j) = x(i,j) + y(i,k) * z(k,j) Loop Extraction Think of this as partial inlining

COMP 512, Rice University15 Memo-function implementation Idea Find pure functions & turn them into hashed lookups Implementation Use interprocedural analysis to identify pure functions Insert stub with lookup between call & evaluation Benefits Replace evaluations with table lookup Potential for substantial run-time savings Should share table implementation with other functions

COMP 512, Rice University16 Cross-jumping Idea Procedure epilogs come in two flavors  Returned value & no returned value Eliminate duplicates & save space Implementation At start of each block, compare ops before predecessor branch If identical, move it across the branch Repeat until code stops changing Presents new challenges to the debugger

COMP 512, Rice University17 Procedure Abstraction Idea Reocgnize common instruction sequences Replace them with (very) cheap calls Experience Need to abstract register names & local labels Use suffix trees ( bio-informatics ) Produces smaller code with longer paths  ~ 1% slower for each 2% smaller Wreaks havoc with the debugger Cooper & McIntosh, P LDI 99, May 1999

COMP 512, Rice University18 Putting It All Together Key questions that a compiler writer must answer? What should the optimizer & back end do? How should the compiler represent the program? Broad answers Figure out what the real problems are Learn everything you can about the target machine Attack each problem with a specific plan Separate concerns as long as possible

COMP 512, Rice University19 Lessons Experience is a tough teacher Understand what the code looks like  On input to the compiler and on output from the compiler  Hard to see problems in large procedures A good compiler need not do everything  Study the code & figure out what’s needed  Do those things well & do them thoroughly Hand simulation before implementation pays off  Avoid implementing things that solve no problem Find the right level of abstraction for each optimization  Constant propagation can be done at source level  Redundancy elimination should be done at a low-level

COMP 512, Rice University20 Lessons Experience is a tough teacher Intermediate representations are important  Determines level of exposed detail  Every pass traverses it, so efficient manipulation is critical Code shape and name spaces are important  Determine opportunities & size of data structures  Has large impact on effectiveness of algorithms Compile-time space counts  Limits size of procedures that can be compiled  Compiler touches all that space  More complex analyses take lots of space

COMP 512, Rice University21 And that’s the end of my story... Comp 512

COMP 512, Rice University22 Lessons Experience is a tough teacher Separation of concerns is vital to progress  Resource constraints from rearrangement  Allocation from optimization  Parsing from code generation Separation of concerns interferes with optimization  Allocation & scheduling  Evaluation order & redundancy elimination  Code shape decisions affect later passes { Loop shape, case stmts. Annotations ( ILOC tags)