© 2002 IBM Corporation IBM Toronto Software Lab October 6, 2004 | CASCON2004 Interprocedural Strength Reduction Shimin Cui Roch Archambault Raul Silvera.

Slides:



Advertisements
Similar presentations
Example of Constructing the DAG (1)t 1 := 4 * iStep (1):create node 4 and i 0 Step (2):create node Step (3):attach identifier t 1 (2)t 2 := a[t 1 ]Step.
Advertisements

CSC 4181 Compiler Construction Code Generation & Optimization.
Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
7. Optimization Prof. O. Nierstrasz Lecture notes by Marcus Denker.
A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Rational XL C/C++ Compiler Development © 2007 IBM Corporation Identifying Aliasing Violations in Source Code A Points-to Analysis Approach Ettore Tiotto,
1 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Extended Whole Program Paths Sriraman Tallam Rajiv Gupta Xiangyu Zhang University of Arizona.
Fall 2011SYSC 5704: Elements of Computer Systems 1 SYSC 5704 Elements of Computer Systems Optimization to take advantage of hardware.
Software Group © 2005 IBM Corporation Compiler Technology October 17, 2005 Array privatization in IBM static compilers -- technical report CASCON 2005.
Trace-Based Automatic Parallelization in the Jikes RVM Borys Bradel University of Toronto.
CS 330 Programming Languages 10 / 16 / 2008 Instructor: Michael Eckmann.
TM Pro64™: Performance Compilers For IA-64™ Jim Dehnert Principal Engineer 5 June 2000.
Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya.
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
4/23/09Prof. Hilfinger CS 164 Lecture 261 IL for Arrays & Local Optimizations Lecture 26 (Adapted from notes by R. Bodik and G. Necula)
9. Optimization Marcus Denker. 2 © Marcus Denker Optimization Roadmap  Introduction  Optimizations in the Back-end  The Optimizer  SSA Optimizations.
Assemblers Dr. Monther Aldwairi 10/21/20071Dr. Monther Aldwairi.
Run-Time Storage Organization
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Intermediate Code. Local Optimizations
Prof. Fateman CS164 Lecture 211 Local Optimizations Lecture 21.
Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982.
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
Procedure Optimizations and Interprocedural Analysis Chapter 15, 19 Mooly Sagiv.
5.3 Machine-Independent Compiler Features
1 Chapter 5: Names, Bindings and Scopes Lionel Williams Jr. and Victoria Yan CSci 210, Advanced Software Paradigms September 26, 2010.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
CH13 Reduced Instruction Set Computers {Make hardware Simpler, but quicker} Key features  Large number of general purpose registers  Use of compiler.
CIS Computer Programming Logic
Runtime Environments Compiler Construction Chapter 7.
Toward Efficient Flow-Sensitive Induction Variable Analysis and Dependence Testing for Loop Optimization Yixin Shou, Robert A. van Engelen, Johnnie Birch,
Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.
Basic Semantics Associating meaning with language entities.
Property of Jack Wilson, Cerritos College1 CIS Computer Programming Logic Programming Concepts Overview prepared by Jack Wilson Cerritos College.
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 1 Developed By:
Chapter 1 Introduction. Chapter 1 - Introduction 2 The Goal of Chapter 1 Introduce different forms of language translators Give a high level overview.
RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
® IBM Software Group © 2009 IBM Corporation Assist Threads for Data Prefetching in IBM XL Compilers Gennady Pekhimenko, Yaoging Gao, IBM Toronto, Zehra.
Chapter 1 Introduction Major Data Structures in Compiler
OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer.
CSC 8505 Compiler Construction Runtime Environments.
Machine Independent Optimizations Topics Code motion Reduction in strength Common subexpression sharing.
Programming for Performance CS 740 Oct. 4, 2000 Topics How architecture impacts your programs How (and how not) to tune your code.
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
© M. Gross, ETH Zürich, 2014 Informatik I für D-MAVT (FS 2014) Exercise 7 – Pointers.
3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.
 Data Type is a basic classification which identifies different types of data.  Data Types helps in: › Determining the possible values of a variable.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Compiler Construction (CS-636)
Optimization Code Optimization ©SoftMoore Consulting.
Code Optimization I: Machine Independent Optimizations
A Framework for Safe Automatic Data Reorganization Shimin Cui (Speaker), Yaoqing Gao, Roch Archambault, Raul Silvera IBM Toronto Software Lab Peng Peers.
Compiler Code Optimizations
Chapter 12 Pipelining and RISC
ENERGY 211 / CME 211 Lecture 11 October 15, 2008.
Run-time environments
Presentation transcript:

© 2002 IBM Corporation IBM Toronto Software Lab October 6, 2004 | CASCON2004 Interprocedural Strength Reduction Shimin Cui Roch Archambault Raul Silvera Yaoqing Gao IBM Toronto Software Lab

IBM Software Group Interprocedural Strength Reduction CASCON2004 Outline  Introduction  Examples  Algorithm in TPO  Summary

IBM Software Group Interprocedural Strength Reduction CASCON2004 Reduction in Strength  Replace costly computations with less expensive ones.  Example of operator strength reduction. Shift, add instead of multiply or divide 16*x  xx << 4  Well known and widely used in compilers – largely restricted to optimize code within a single procedure.

IBM Software Group Interprocedural Strength Reduction CASCON2004 Interprocedural Strength Reduction - Overview  Precompute costly computations in less frequently executed locations.  Replace costly computations with less expensive ones.  Reduce size of global objects referenced repeatedly in loops.  Provide opportunities for further optimizations.  Implemented in the link path of IBM product compiler. A method to reduce the costly computations in strength interprocedurally to improve the code performance.

IBM Software Group Interprocedural Strength Reduction CASCON2004 Example 1: Global Scalars  Pre-compute costly computations on global scalar variables. init() { // execute count: 1 a = …; b = …; c = …; d = …; } bar() { … for (i = 0; i < 1000; ++i) { … // execute count: 1000 x += expr(a, b); y += c/d; … foo(); … } foo() { … if (cond) { // execute count: 10 b = …; d = …; } … // execute count: 1000 c = …; … }

IBM Software Group Interprocedural Strength Reduction CASCON2004 Example 1 (cont.)  Pre-compute costly computations in less frequently executed locations. init() { // execute count: 1 a = …; b = …; isr1 = expr(a, b); c = …; d = …; } bar() { … for (i = 0; i < 1000; ++i) { … // execute count: 1000 x += expr(a, b); isr1; y += c/d; … foo(); … } foo() { … if (cond) { // execute count: 10 b = …; isr1 = expr(a, b); d = …; } … // execute count: 1000 c = …; … }

IBM Software Group Interprocedural Strength Reduction CASCON2004 Example 1 (cont.)  Replace costly computations with less expensive computations. init() { // execute count: 1 a = …; b = …; isr1 = expr(a, b); c = …; d = …; isr2 = (m(d), s(d)); } bar() { … for (i = 0; i < 1000; ++i) { … // execute count: 1000 x += expr(a, b) isr1; y += c/d; (c*isr2.m) >> isr2.s; … foo(); … } foo() { … if (cond) { // execute count: 10 b = …; isr1 = expr(a, b); d = …; isr2 = (m(d), s(d)); } … // execute count: 1000 c = …; … }

IBM Software Group Interprocedural Strength Reduction CASCON2004 Example 2: Global Objects  Pre-compute costly computations on global arrays and dynamic objects. float **a; int b[M][N]; int bar() { a = malloc(sizeof(float*)*M); for (i =0 ; i < M; i++) { a[i] = malloc(sizeof(float)*N); } … } int foo() { … for (i =0 ; i < M; i++) { for (j = 0 ; j < N; j++) { if (expra(a[i][j]) > 1) { x += 1; } y += exprb(b[i][j])% 256 ; … } … } … }

IBM Software Group Interprocedural Strength Reduction CASCON2004 Example 2 (cont.)  Reduce size of global objects referenced repeatedly in loops to improve the data cache performance. float **a; int b[M][N]; bool **isra; char isrb[M][N]; int bar() { a = malloc(sizeof(float*)*M); isra = malloc(sizeof(bool*)*M); for (i =0 ; i < M; i++) { a[i] = malloc(sizeof(float)*N); isra[i] = malloc(sizeof(bool)*N); } … } int foo() { … for (i =0 ; i < M; i++) { for (j = 0 ; j < N; j++) { if (expra(a[i][j]) > 1 isra[i][j]) { x += 1; } y += exprb(b[i][j])% 256 isrb[i][j]; … } … } … }

IBM Software Group Interprocedural Strength Reduction CASCON2004 Algorithm in IPA Link Phase  Only supported at -O5 (IPA level 2). (Backward Data Flow Pass) Modify intermediate representation of code (Forward Data Flow Pass) Gather def/use information Analyze the collected information

IBM Software Group Interprocedural Strength Reduction CASCON2004 Gather Information  A global variable is considered only when all of its possible definition locations are known. Aliases of global objects Size of the global objects (static or runtime profile)  Identify the costly computations which operates only on global variables and collect execution cost related information. Costly computation Computation that can be mapped to an object reference of smaller size data type  Identify the stores where global variables are modified and collect execution cost related information.

IBM Software Group Interprocedural Strength Reduction CASCON2004 Cost analysis  Select the candidate computations for reduction in strength based on the cost analysis for the whole program. The total cost for both computations and precomputations The total size of the global objects  Create a global variable for each selected computation. Initialization of the global variable Indexing or indirect symbols Aliasing symbols

IBM Software Group Interprocedural Strength Reduction CASCON2004 Code Transformation  Replace the candidate computation by a weaker computation. Load of the created global variable - direct, indirect, indexing Multiply-shift sequence using magic number  Insert the store of global variables (and their aliases) at definition points of all the variables used in the selected computations. Store operation Memory allocation Pointer assignment

IBM Software Group Interprocedural Strength Reduction CASCON2004 Summary  Precompute costly computations in less frequently executed place To reduce total number of costly computation  Replace costly computations with less expensive ones To reduce the strength of operation  Reduce size of global objects referenced repeatedly in loops To improve data cache utilization

IBM Software Group Interprocedural Strength Reduction CASCON2004 Questions?