Order from Chaos — the big picture — C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.

Slides:



Advertisements
Similar presentations
Code Optimization, Part II Regional Techniques Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.
Advertisements

Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
7. Optimization Prof. O. Nierstrasz Lecture notes by Marcus Denker.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Program Representations. Representing programs Goals.
The Last Lecture Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission.
Partial Redundancy Elimination & Lazy Code Motion
Lazy Code Motion C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Loop Invariant Code Motion — classical approaches — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students.
Introduction to Code Optimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
4/23/09Prof. Hilfinger CS 164 Lecture 261 IL for Arrays & Local Optimizations Lecture 26 (Adapted from notes by R. Bodik and G. Necula)
9. Optimization Marcus Denker. 2 © Marcus Denker Optimization Roadmap  Introduction  Optimizations in the Back-end  The Optimizer  SSA Optimizations.
The Procedure Abstraction Part I: Basics Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Introduction to Program Optimizations Chapter 11 Mooly Sagiv.
Intermediate Code. Local Optimizations
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Introduction to Optimization Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Instruction Scheduling II: Beyond Basic Blocks Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
The Procedure Abstraction Part I: Basics Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Global Common Subexpression Elimination with Data-flow Analysis Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Code Optimization, Part III Global Methods Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Introduction to Optimization, II Value Numbering & Larger Scopes Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
First Principles (with examples from value numbering) C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
Order from Chaos — the big picture — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Replication & Consolidation — a grab bag of transformations — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.
Global Redundancy Elimination: Computing Available Expressions Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
High-Level Transformations for Embedded Computing
Algebraic Reassociation of Expressions Briggs & Cooper, “Effective Partial Redundancy Elimination,” Proceedings of the ACM SIGPLAN 1994 Conference on Programming.
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.
Terminology, Principles, and Concerns, II With examples from superlocal value numbering (Ch 8 in EaC2e) Copyright 2011, Keith D. Cooper & Linda Torczon,
Cleaning up the CFG Eliminating useless nodes & edges This lecture describes the algorithm Clean, presented in Chapter 10 of EaC2e. The algorithm is due.
Profile-Guided Code Positioning See paper of the same name by Karl Pettis & Robert C. Hansen in PLDI 90, SIGPLAN Notices 25(6), pages 16–27 Copyright 2011,
3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.
Profile Guided Code Positioning C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
Instruction Scheduling: Beyond Basic Blocks Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp.
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Code Optimization Overview and Examples
Introduction to Optimization
Code Optimization.
Optimization Code Optimization ©SoftMoore Consulting.
Finding Global Redundancies with Hopcroft’s DFA Minimization Algorithm
Introduction to Optimization
Intermediate Representations
Introduction to Code Generation
Optimizing Transformations Hal Perkins Autumn 2011
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
Instruction Scheduling: Beyond Basic Blocks
The Procedure Abstraction Part I: Basics
Introduction to Optimization Hal Perkins Summer 2004
Intermediate Representations
Optimizing Transformations Hal Perkins Winter 2008
Optimization through Redundancy Elimination: Value Numbering at Different Scopes COMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith.
The Last Lecture COMP 512 Rice University Houston, Texas Fall 2003
Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.
Introduction to Optimization
Instruction Scheduling: Beyond Basic Blocks
Algebraic Reassociation of Expressions COMP 512 Rice University Houston, Texas Fall 2003 P. Briggs & K.D. Cooper, “Effective Partial Redundancy Elimination,”
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Lecture 19: Code Optimisation
The Partitioning Algorithm for Detecting Congruent Expressions COMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper.
Presentation transcript:

Order from Chaos — the big picture — C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use.

COMP 512, Fall Optimization The subject is confusing Whole notion of optimality Incredible number of transformations Odd, inconsistent terminology Maybe this stuff is inherently hard Many intractable problems Many NP-complete problems Much overlap between problems and between solutions If optimization wasn’t confusing, why take C OMP 512 ? { Value numbering Redundancy elimination Common subexpressions Cooper McKinley, & Torczon cite 237 distinct papers in the survey!

COMP 512, Fall Optimization A brief catalog (circa 1982 ) And this was before the literature exploded

COMP 512, Fall Optimization The literature throws fuel on the fire Terminology is non-standard & non-intuitive Explanations are terse and incomplete Little comparative data that is believable No sense of perspective Papers give conflicting advice An example – Is inline substitution profitable? Holler’s thesis: it almost always helps Hall’s thesis: it occasionally helps, but has lots of problems MacFarland’s thesis: it causes instruction cache misses Reality lies somewhere in the middle { Revival Partially-dead code Forward propagation } Not all those papers can be the best !

COMP 512, Fall Optimization Improvement should be objective Easy to quantify Produce concrete improvements Taking measurements seems easy Code either gets better or it gets worse But, … Linear-time heuristics for hard problems Unforeseen consequences & poorly understood interactions “Obvious wins” have non-obvious downsides Multiple ways to achieve the same end Experimental computer science takes a lot of work

COMP 512, Fall The Role of Comp 512 Bringing order out of chaos Provide a framework for thinking about optimization Differentiate analysis from transformation † Think about how things help, not what they do Goal: a rational approach to the subject matter Objective criteria for evaluating ideas & papers Bring high school level science back into the game † The Comp 512 Motto: Knowledge alone does not make code run faster. You have to change the code to make it run faster.

COMP 512, Fall Classic Taxonomy Machine independent transformations Applicable across a broad range of machines Decrease ratio of overhead to real work Reduce running time or space Examples: dead code elimination Machine dependent transformations Capitalize on specific machine properties Improve the mapping from IR to this machine Might use an exotic instruction ( shift the reg. window for a loop ) Example: instruction scheduling

COMP 512, Fall Classic Taxonomy Distinction is not always clear Replacing multiply with shifts and adds Eliminating a redundant expression The truth is somewhat muddled Machine independent means that we deliberately & knowingly ignore target-specific constraints Machine dependent means that we explicitly consider target- specific constraints Redundancy elimination might fit in either category  Versions that consider register pressure

COMP 512, Fall The Comp 512 Taxonomy An effects-based classification (for speed) Five machine-independent ways to speed up code  Eliminate a redundant computation  Move code to a place where it executes less often  Eliminate dead code  Specialize a computation based on context  Enable another transformation Three machine-dependent ways to speed up the code  Manage or hide latency  Take advantage of special hardware features  Manage finite resources For scalar optimization, this covers most of them

COMP 512, Fall Will see in 512 Seen already in 512 Dead code Dead code elim. Partial d.c.e. Constant propagation Algebraic identities The Comp 512 Taxonomy Machine Independent From §6 of Cooper, McKinley, & Torczon Redundancy Redundancy elim. Partial red. elim. Consolidation Code motion Loop-invariant c.m. Consolidation Global Scheduling [Click] Constant propagation Create opportunities Reassociation Replication Specialization Replication Strength Reduction Method caching Heap  stack allocation Tail recursion elimination *

COMP 512, Fall Will see In 512 Saw In 412 The Comp 512 Taxonomy Machine Dependent From §6 of Cooper, McKinley, & Torczon Hide latency Scheduling Blocking references Prefetching Code layout Data packing Manage resources Allocate (registers, tlb slots) Schedule Data packing Coloring memory locations Special features Instruction selection Peephole optimization *

COMP 512, Fall What have we seen so far? Redundancy elimination  DAG building, LVN, SVN, DVN, AVAIL  It is a category in taxonomy by itself Dead store elimination  Form of dead code elimination Hoisting  Form of code motion (for space, not speed) Global Constant Propagation  Form of specialization, form of code motion ( dead code, too? ) Useless code elimination  Form of dead code elimination (obviously)

COMP 512, Fall What about Fortran H? (Lecture 2) Eliminate a redundant computation  Commoning Move code to a place where it executes less often  Backward motion Eliminate dead code  (must have done it, but don’t talk about it) Specialize a computation based on context  Strength reduction Enable another transformation  Reassociation Manage or hide latency Take advantage of special hardware features Manage finite resources  Register allocation } Didn’t really talk about these, if they did them *

COMP 512, Fall Scope of Optimization ( another axis ) Local Handles individual basic blocks  Maximal length sequence of straight line code  Each of the b i’ s is a basic block Basic blocks are easy to analyze Can prove strongest results Code quality suffers at block boundaries Local methods Value numbering, instruction scheduling b4b4 b5b5 b6b6 b1b1 b3b3 b2b2 *

COMP 512, Fall Superlocal Handles extended basic blocks  Sequence of blocks where each has a unique predecessor  Use results for b i to help with b j Analysis & transformation over larger region Fewer rough edges Can make it efficient by reusing results Superlocal methods Value numbering, instruction scheduling Scope of Optimization b4b4 b5b5 b6b6 b1b1 b3b3 b2b2 *

COMP 512, Fall Regional Arbitrary subset of blocks ( loop nests, dominator subtrees ) Use results from one block to improve others Limiting scope can increase focus on performance critical regions Can eliminate some global impediments Regional methods Loop xforms (unroll, fuse, interchange, strip mining, blocking), DVNT, OSR, register promotion, prefetch insertion, software pipelining, trace scheduling... Scope of Optimization b1b1 b2b2 b3b3 b4b4 b5b5 Remember Fortran H *

COMP 512, Fall Whole procedure ( global or intraprocedural ) Handles entire procedure Make decisions based on global knowledge & global benefit No rough edges inside procedure (tied to compilation unit) Classic data-flow analysis Global methods CSE (A VAIL, P RE, L CM ), constant propagation, G CRA, dead code elim., hoisting, copy coalescing,... Scope of Optimization b4b4 b5b5 b6b6 b1b1 b3b3 b2b2

COMP 512, Fall Whole program ( interprocedural ) Handles more than one procedure, up to entire program Create even larger scopes for optimization Limited interactions between procedures  Parameters + global variables Analysis problems are harder Opportunities are different Interprocedural methods Inline substitution, procedure cloning, constant propagation, using whole program analysis to support global xforms Scope of Optimization b4b4 b5b5 b6b6 b1b1 b3b3 b2b2 b4b4 b5b5 b6b6 b1b1 b3b3 b2b2 b4b4 b5b5 b6b6 b1b1 b3b3 b2b2

COMP 512, Fall Road Map Next week, Tim will lecture on something (Thursday) CFG Clean Up, Unreachable code elimination Operator Strength Reduction Lazy Code Motion Algebraic Reassociation of Expressions Optimizing for Space Profile Guided Code Positioning Register Allocation--Beyond 412 Interprocedural Analysis and Optimization Adaptive Optimization

COMP 512, Fall What about the Experience Papers?