Eliminating Array Bounds Checks — and related problems — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.

Slides:

Advertisements

Similar presentations

Static Single-Assignment ? ? Introduction: Over last few years [1991] SSA has been Stablished as… Intermediate program representation.

Advertisements

8. Static Single Assignment Form Marcus Denker. © Marcus Denker SSA Roadmap  Static Single Assignment Form (SSA)  Converting to SSA Form  Examples.

Operator Strength Reduction From Cooper, Simpson, & Vick, “Operator Strength Reduction”, ACM TOPLAS, 23(5), See also § of EaC2e. 1COMP 512,

ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.

A Deeper Look at Data-flow Analysis Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University.

Code Motion of Control Structures From the paper by Cytron, Lowry, and Zadeck, COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda.

SSA-Based Constant Propagation, SCP, SCCP, & the Issue of Combining Optimizations 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon,

The Last Lecture Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission.

Lazy Code Motion Comp 512 Spring 2011

Loop Invariant Code Motion — classical approaches — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students.

Introduction to Code Optimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.

Code Shape III Booleans, Relationals, & Control flow Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled.

Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.

CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.

Introduction to Optimization Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Instruction Scheduling II: Beyond Basic Blocks Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.

Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.

Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.

PSUCS322 HM 1 Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.

Dynamic Optimization as typified by the Dynamo System See “Dynamo: A Transparent Dynamic Optimization System”, V. Bala, E. Duesterwald, and S. Banerjia,

Compiler Code Optimizations. Introduction Introduction Optimized codeOptimized code Executes faster Executes faster efficient memory usage efficient memory.

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Code Optimization, Part III Global Methods Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.

Structural Data-flow Analysis Algorithms: Allen-Cocke Interval Analysis Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students.

Introduction to Optimization, II Value Numbering & Larger Scopes Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.

First Principles (with examples from value numbering) C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,

The Procedure Abstraction, Part VI: Inheritance in OOLs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.

The Procedure Abstraction, Part V: Support for OOLs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in.

Using SSA Dead Code Elimination & Constant Propagation C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,

Operator Strength Reduction C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved. Students.

Building SSA Form, III 1COMP 512, Rice University This lecture presents the problems inherent in out- of-SSA translation and some ways to solve them. Copyright.

Global Redundancy Elimination: Computing Available Expressions Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.

Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,

Algebraic Reassociation of Expressions Briggs & Cooper, “Effective Partial Redundancy Elimination,” Proceedings of the ACM SIGPLAN 1994 Conference on Programming.

Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.

Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.

Dead Code Elimination This lecture presents the algorithm Dead from EaC2e, Chapter 10. That algorithm derives, in turn, from Rob Shillner’s unpublished.

Boolean & Relational Values Control-flow Constructs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in.

Terminology, Principles, and Concerns, II With examples from superlocal value numbering (Ch 8 in EaC2e) Copyright 2011, Keith D. Cooper & Linda Torczon,

Cleaning up the CFG Eliminating useless nodes & edges This lecture describes the algorithm Clean, presented in Chapter 10 of EaC2e. The algorithm is due.

Profile-Guided Code Positioning See paper of the same name by Karl Pettis & Robert C. Hansen in PLDI 90, SIGPLAN Notices 25(6), pages 16–27 Copyright 2011,

Profile Guided Code Positioning C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.

©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.

CS 412/413 Spring 2005Introduction to Compilers1 CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 30: Loop Optimizations and Pointer Analysis.

Definition-Use Chains

Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.

Code Optimization Overview and Examples

Introduction to Optimization

Code Optimization.

Static Single Assignment

Optimization Code Optimization ©SoftMoore Consulting.

Introduction to Optimization

Code Generation Part III

Introduction to Code Generation

Optimizing Transformations Hal Perkins Autumn 2011

Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.

The Procedure Abstraction Part V: Run-time Structures for OOLs

Code Shape III Booleans, Relationals, & Control flow

Compiler Code Optimizations

Code Optimization Overview and Examples Control Flow Graph

Code Generation Part III

The Last Lecture COMP 512 Rice University Houston, Texas Fall 2003

Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.

Data Flow Analysis Compiler Design

Introduction to Optimization

Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.

Algebraic Reassociation of Expressions COMP 512 Rice University Houston, Texas Fall 2003 P. Briggs & K.D. Cooper, “Effective Partial Redundancy Elimination,”

Intermediate Code Generation

The Partitioning Algorithm for Detecting Congruent Expressions COMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper.

Presentation transcript:

Eliminating Array Bounds Checks — and related problems — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. Comp 512 Spring 2011

COMP 512, Rice University2 The Problem Programmers are not perfect  Write programs that perform out-of-bounds references  All of these are questionable; some are malicious Technology for avoiding out-of-bounds references is easy  Bounds check each reference  1970s compilers could insert such checks (PL/I) Code with checking runs slowly  Obvious opportunity for optimization  We need compilers that implement this kind of checking  Buffer overflow attacks & other subtle bugs PL.8 philosophy: check everything & optimize checks

COMP 512, Rice University3 Obvious Solution Add a dynamic check to each reference check(a,i) performs two tests  Min(a) ≥ i raises an exception (lbcheck(a,i))  i ≥ Max(a) raises an exception (ubcheck(a,i)) In a loop, the compiler may be able to move one or both tests based on knowledge of the induction variable’s behavior Same abstraction fits structures of arrays, arrays of structures, … …  a[i] check(a,i) …  a[i] Assume a[1:100], so Min(a) is 1 and Max(a) is 100 IBM PL/I compilers implemented “check” as a built-in function (a procedure call).

COMP 512, Rice University4 Obvious Solution Implementing check in the compiler Treat check as an atomic action in the IR  Each check represents a potential exit ( abnormal termination )  Each check entails tests and control-flow operations  Each check has a reasonably high overhead Easier to optimize check as an atomic operation Implemented this way in the early IBM PL/I compilers and PL.8 …  a[i] check(a,i) …  a[i] check(a,i) if (lb(a) ≥ i) then raise exception if (I ≥ ub(a)) then raise exception

COMP 512, Rice University5 Obvious Solution References in loops can be optimized Repeated checks replaced by check of endpoints Code motion combined with special case reasoning about arithmetic comparisons  Min(a) ≤ j ≤ Max(a) and Min(a) ≤ k ≤ Max(a) and j ≤ k > is equivalent to Min(a) ≤ j and Min(a) ≤ k and j ≤ k  Is equivalent to lbcheck(a,j) and ubcheck(a,k) if (j ≤ k) then check(a,j) check(a,k) for i = j to k …  a[i] for i = j to k check(a,i) …  a[i]

COMP 512, Rice University6 Complications Unfortunately, it is not that simple … Our hand transformation assumes that k is invariant If loop modifies k, transformation must be more complex  Pre-loop check cannot determine range of i  Need a pre-loop check and a post-loop check  Recall the loop-exit landing pads in Cytron, Lowry, & Zadeck if (j ≤ k) then lbcheck(a,j) ubcheck(a,k) for i = j to k …  a[i] k  fee(i) for i = j to k check(a,i) …  a[i] k  fee(i) No longer correct

COMP 512, Rice University7 Complications Unfortunately, it is not that simple … Loop exits normally if k ≤ Max(a) Loop exits prematurely if k > Max(a)  Original loop would have referenced beyond a’s bounds. Post-loop test determines which exit was taken …  “raise exception” is same as ubcheck(a,i) if (j ≤ k) then lbcheck(a,j) for i = j to min(k,Max(a)) …  a[i] if k > Max(a) then raise exception (die) for i = j to k check(a,i) …  a[i]

COMP 512, Rice University 8 Minor improvement Peeling first iteration can eliminate one extra test This example is a minor win from code shape Trades minor code space for minor speed eliminates one dynamic test if (j ≤ k) then lbcheck(a,j) …  a[i] for i = j+1 to min(k,Max(a)) …  a[i] if k > Max(a) then raise exception (die) for i = j to k check(a,i) …  a[i]

COMP 512, Rice University9 Using Contextual Knowledge With known loop bounds, the checks become static Overhead of checking goes to zero in this case (pretty good) Known lower bound eliminates pre-loop check  for i = 1 to k is a common case …  for i = 1 to 63 is less common, but still happens … for i = 1 to 100 check(a,i) …  a[i] if (1 ≤ 100) then lbcheck(a,1) for i = 1 to 100 …  a[i] if (100 > Max(a)) then raise exception evaluate at compile time

COMP 512, Rice University10 Details Markstein, Cocke, and Markstein viewed this transformation as a form of strength reduction Replaces repeated strong tests (check) inside a loop with two weaker tests (lbcheck and ubcheck) Focuses on tests related to induction variables (as with OSR ) Strength Reduction for check The variable used in the check, t, must be linearly related to the induction variable, i, used in the end-of-loop test  i x c 1 - t = c 2 must hold where c 1 & c 2 are region constants The check must occur in an articulation node of the loop  The “loop” is an SCC of the CFG  Articulation point has property that it lies on every path through the SCC ( e.g., loop header is an articulation point )

COMP 512, Rice University11 Details The algorithm Create lbcheck in loop’s landing pad & copy support operations Replace loop exit test  Original test was i < n  New test is i < min(n,Max(a)) Insert a new test after the loop exit  On exit, if i > ub, original loop would have tripped on check  Raise exception if i > ub if (j ≤ k) then lbcheck(a,j) for i = j to min(k,Max(a)) …  a[i] if k > Max(a) then ubcheck(a,I) for i = j to k check(a,i) …  a[i] In terms of first iteration In terms of new exit test Original check is dead Will always fail

COMP 512, Rice University12 Details The algorithm If possible, place the exit test in the entry landing pad  Creates local common subexpressions & simplifies test Safety conditions:  Loop has a single exit  Loop ending branch is in an articulation point  Induction variable increment is ± 1  Upper bound is invariant in the loop With these conditions, can place exit check in entry landing pad  Now, code looks like version produced by hand … if (j ≤ k) then lbcheck(A,j) ubcheck(A,k) for i = j to k …  A[i] for i = j to k check(A,i) …  A[i] Otherwise, place it in a loop exit landing pad (CLZ) One minor issue is that reduced check triggers exception too early

COMP 512, Rice University13 Extensions to Markstein, Cocke, Markstein Local Improvements check can be redundant  Value number them or use special case algorithm One check can subsume another ( multiple references )  lbcheck(a,i) and lbcheck(a,j)  lbcheck(a,i) if i ≤ j  ubcheck(a,i) and upcheck(a,j)  ubcheck(a,j) if i ≤ j  If the subsumption is local, applying this insight is easy Global check elimination Subsuming checks can cover the entry or exit of the loop Similar to available expressions and very busy expressions Can formulate check hoisting & sinking as DF problems, too  Similar results to MC&M’s OSR of range checking

COMP 512, Rice University14 Control Flow The examples show simple loops with no internal control flow Conditionals pose serious problem  Evaluate check if reference is evaluated  Other references may subsume the guarded reference  Markstein, Cocke, Markstein does not address the issue  Other strategies (Gupta, LCM) have problems Room for further work on this issue  Replication, more aggressive code motion, … for i  j to k … if (f(i)) then …  a[i+c] … …  a[i] c = 0  check is subsumed c ≥ 1  lbcheck is subsumed c ≤ -1  ubcheck is subsumed

COMP 512, Rice University15 And What About Pointers? Pointer checking is, in principle, a similar problem … Easy to see in simple cases for (i = 1; i <= n; i++) *p ++ = 0; if (1 < n) { check(p); check(p+n*sizeof(*p)); } for (i = 1; i <= n; i++) *p ++ = 0; for (i = 1; i <= n; i++ ) { check(p); *p ++ = 0; } Even this case has difficulties What does p reference? How big is it?  check needs to know Can use run-time tags on the pointer or the object  Requires more memory references  Need sizes on any object whose address is taken ( &x ) Can eliminate some checks (& tags) statically  Known sizes & unambiguous pointers

COMP 512, Rice University16 Bibliography “Optimization of Range Checking,” V. Markstein, J. Cocke, P. Markstein, Proceedings of the 1982 ACM SIGPLAN Conference on Compiler Construction, SIGPLAN Notices, pages “Optimizing Array Bound Checks Using Flow Analysis,” R. Gupta, ACM Letters on Programming Languages and Systems (LOPLAS), 2(1-4), pages