Rational Apex 4.0 Optimization “Beware the benchmark!”

Slides:



Advertisements
Similar presentations
CSC 4181 Compiler Construction Code Generation & Optimization.
Advertisements

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
CS 378 Programming for Performance Single-Thread Performance: Compiler Scheduling for Pipelines Adopted from Siddhartha Chatterjee Spring 2009.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Compiler techniques for exposing ILP
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
1 4/20/06 Exploiting Instruction-Level Parallelism with Software Approaches Original by Prof. David A. Patterson.
FTC.W99 1 Advanced Pipelining and Instruction Level Parallelism (ILP) ILP: Overlap execution of unrelated instructions gcc 17% control transfer –5 instructions.
1 Chapter 8: Code Generation. 2 Generating Instructions from Three-address Code Example: D = (A*B)+C =* A B T1 =+ T1 C T2 = T2 D.
ILP: Loop UnrollingCSCE430/830 Instruction-level parallelism: Loop Unrolling CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Feb 24, 2009 Topic: Instruction-Level Parallelism IV (Software Approaches/Compiler.
EEL Advanced Pipelining and Instruction Level Parallelism Lotzi Bölöni.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 3 Memory Management Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Program Representations. Representing programs Goals.
3.13. Fallacies and Pitfalls Fallacy: Processors with lower CPIs will always be faster Fallacy: Processors with faster clock rates will always be faster.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Chapter 4 – C Program Control Outline 4.1Introduction.
RUP And Agile Development Processes Walker Royce and Gary Pollice.
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
©1998, 1999, 2000 Rational Software - All rights reserved Session VM08 Structuring Your Rational Rose Model Robert Bretall Rational Software.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
EECC551 - Shaaban #1 Winter 2002 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located.
EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
EECC551 - Shaaban #1 Fall 2005 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
Chapter 2 Instruction-Level Parallelism and Its Exploitation
Finding and Debugging Errors
Class canceled next Tuesday. Recap: Components of IR Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements.
Chapter 6. 2 Objectives You should be able to describe: Function and Parameter Declarations Returning a Single Value Pass by Reference Variable Scope.
Improving Code Generation Honors Compilers April 16 th 2002.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.
1. 2 FUNCTION INLINE FUNCTION DIFFERENCE BETWEEN FUNCTION AND INLINE FUNCTION CONCLUSION 3.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
C++ How to Program, 8/e © by Pearson Education, Inc. All Rights Reserved. Note: C How to Program, Chapter 22 is a copy of C++ How to Program Chapter.
Operator Precedence First the contents of all parentheses are evaluated beginning with the innermost set of parenthesis. Second all multiplications, divisions,
A First Book of C++: From Here To There, Third Edition2 Objectives You should be able to describe: Function and Parameter Declarations Returning a Single.
Lecture 4 C Program Control Acknowledgment The notes are adapted from those provided by Deitel & Associates, Inc. and Pearson Education Inc.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
Chapter 25: Code-Tuning Strategies. Chapter 25  Code tuning is one way of improving a program’s performance, You can often find other ways to improve.
COP4020 Programming Languages Subroutines and Parameter Passing Prof. Xin Yuan.
SOCSAMS e-learning Dept. of Computer Applications, MES College Marampally VIRTUALMEMORY.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Optimised C/C++. Overview of DS General code Functions Mathematics.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
Procedures and Functions Procedures and Functions – subprograms – are named fragments of program they can be called from numerous places  within a main.
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.
Chapter 7 Continued Arrays & Strings. Arrays of Structures Arrays can contain structures as well as simple data types. Let’s look at an example of this,
LECTURE 3 Translation. PROCESS MEMORY There are four general areas of memory in a process. The text area contains the instructions for the application.
Color Palette To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides APPLYING THESE COLORS.
C Program Control September 15, OBJECTIVES The essentials of counter-controlled repetition. To use the for and do...while repetition statements.
Lecture 3 Translation.
Code Optimization Overview and Examples
Course Contents KIIT UNIVERSITY Sr # Major and Detailed Coverage Area
Code Optimization.
Chapter 4 – C Program Control
Segmentation COMP 755.
Optimization Code Optimization ©SoftMoore Consulting.
CSCE430/830 Computer Architecture
Code Optimization Overview and Examples Control Flow Graph
Debuggers and Debugging
Presentation transcript:

Rational Apex 4.0 Optimization “Beware the benchmark!”

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Presentation Outline  Outline Rational Apex optimization behaviour  Demonstrate some of the optimization techniques being used by modern compilers  Show how these techniques defeat many of the assumptions made by traditional benchmarking suites  Outline Rational Apex optimization behaviour  Demonstrate some of the optimization techniques being used by modern compilers  Show how these techniques defeat many of the assumptions made by traditional benchmarking suites

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Rational Apex Optimization  Optimization with Apex has 3 levels, controlled by the OPTIMIZATION_LEVEL switch  Level 0 – No optimization, maximize debuggability This is the default  Level 1 – Many optimizations performed, some debuggability maintained  Level 2 – All optimizations performed, debugging may be very limited in some code  Optimization with Apex can have one of two objectives  Time – try to generate code with that will execute in minimal time  Space – try to generate code that is as compact as possible  These two objectives are not mutually exclusive!  Optimization with Apex has 3 levels, controlled by the OPTIMIZATION_LEVEL switch  Level 0 – No optimization, maximize debuggability This is the default  Level 1 – Many optimizations performed, some debuggability maintained  Level 2 – All optimizations performed, debugging may be very limited in some code  Optimization with Apex can have one of two objectives  Time – try to generate code with that will execute in minimal time  Space – try to generate code that is as compact as possible  These two objectives are not mutually exclusive!

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Rational Apex Optimization  Apex performs optimization in several different places  Front End – post semantics Common sub-expression elimination Code in-lining Loop unrolling Remove unused code from local scope  Machine independent instruction stream optimizer “optim” Loop invariant hoisting Range propogation Constraint check elimination Reduce memory movement  Machine specific code generator Peep-hole optimization  All optimization consumes extra CPU during compilation  The default is off – OPTIMIZATION_LEVEL: 0  Apex performs optimization in several different places  Front End – post semantics Common sub-expression elimination Code in-lining Loop unrolling Remove unused code from local scope  Machine independent instruction stream optimizer “optim” Loop invariant hoisting Range propogation Constraint check elimination Reduce memory movement  Machine specific code generator Peep-hole optimization  All optimization consumes extra CPU during compilation  The default is off – OPTIMIZATION_LEVEL: 0

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Example Code – Summation of SQRT  Simple routine that sums up square roots and prints the result

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Example Code – Body of G_E_F.Sqrt

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Example Code – Body of G_E_F.Hardware

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Optimization Level 0  No inlining, no code elimination, no check elimination  Disassembly of sum_sqrt.2.ada is lines long  No unused code has been eliminated – all the code for generic_elementary_functions remains  No inlining, no code elimination, no check elimination  Disassembly of sum_sqrt.2.ada is lines long  No unused code has been eliminated – all the code for generic_elementary_functions remains

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Optimization Level 0 – Disassembly of “for” loop

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Optimization Level 0 – Disassembly of sqrt  163 lines of assembly  Slightly abridged  163 lines of assembly  Slightly abridged

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Optimization Level 0 – Disassembly of hardware  56 Lines of disassembly for SQRT  10 Instructions for SQRT_32  56 Lines of disassembly for SQRT  10 Instructions for SQRT_32

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Optimization Level 0 – Summary  Total of over 220 instructions generated for the code that we are interested in  Lots of it will be unused  Not to mention the rest of the code for the instantiation  Code maps back to source easily  Code layout follows source  Lots of overhead for this straightforward code  Subprogram prolog/epilog code Stack checks Register management  Subprogram call/return code (3 levels deep)  No delayed branch slots being filled  Total of over 220 instructions generated for the code that we are interested in  Lots of it will be unused  Not to mention the rest of the code for the instantiation  Code maps back to source easily  Code layout follows source  Lots of overhead for this straightforward code  Subprogram prolog/epilog code Stack checks Register management  Subprogram call/return code (3 levels deep)  No delayed branch slots being filled

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Optimization Level 2 – Disassembly of “for” loop

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Optimization Level 2 – Observations  Disassembly of sum_sqrt.2.ada is 85 lines long  Entire loop and all the called subprogram code is now 12 instructions long  5 instructions for “for” loop management Includes 2 instructions for branching  4 instructions for integer to float conversion 2 are identical, as one copy is used to fill a delayed branch slot at the bottom of the loop  1 instruction for the Text_Io code is used to fill a branch delay slot  2 Instructions to perform the actual Sqrt and summation.  Disassembly of sum_sqrt.2.ada is 85 lines long  Entire loop and all the called subprogram code is now 12 instructions long  5 instructions for “for” loop management Includes 2 instructions for branching  4 instructions for integer to float conversion 2 are identical, as one copy is used to fill a delayed branch slot at the bottom of the loop  1 instruction for the Text_Io code is used to fill a branch delay slot  2 Instructions to perform the actual Sqrt and summation.

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Optimization Level 2 – Observations  The optimization objective was Time  Time is certainly optimized, but Space also benefited enormously  Different optimization techniques combined effectively to produce very effective code  Inlining of 3 levels of subprogram call eliminated a significant amount of subprogram prolog/epilog  Range propagation determined that the argument to SQRT could never be less than zero, which allowed the argument check to be removed  Evaluation of compile static expressions resulted in a lot of code not being generated Kind of floating point type – no case statement needed Availability of Hardware SQRT – no call needed to Has_Sqrt  Register lifetime analysis on the resulting code meant that the loop control variable and the summation variable could live in registers  The optimization objective was Time  Time is certainly optimized, but Space also benefited enormously  Different optimization techniques combined effectively to produce very effective code  Inlining of 3 levels of subprogram call eliminated a significant amount of subprogram prolog/epilog  Range propagation determined that the argument to SQRT could never be less than zero, which allowed the argument check to be removed  Evaluation of compile static expressions resulted in a lot of code not being generated Kind of floating point type – no case statement needed Availability of Hardware SQRT – no call needed to Has_Sqrt  Register lifetime analysis on the resulting code meant that the loop control variable and the summation variable could live in registers

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Performing Benchmarks  Benchmarks usually consist of two distinct loops  A “Null Timing” loop to determine the overhead of the loop code itself  The Code Under Test loop which has the same structure as the Null timing loop with the inside of the loop replaced with the C.U.T  Timing equation looks like  T CUT = (T CUT_loop – T null_loop ) / n Where n is the number of iterations Usually n has to be very high so that the resolution of the system clock is not significant in the result  Benchmarks usually consist of two distinct loops  A “Null Timing” loop to determine the overhead of the loop code itself  The Code Under Test loop which has the same structure as the Null timing loop with the inside of the loop replaced with the C.U.T  Timing equation looks like  T CUT = (T CUT_loop – T null_loop ) / n Where n is the number of iterations Usually n has to be very high so that the resolution of the system clock is not significant in the result

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Performing Benchmarks  One effect we notice is that sometimes a benchmark suite reports slower times for code even though we know we have improved our optimizations!  What’s happening?  The Null Timing loops of benchmark suites attempt to defeat compiler optimizations that skew their results  Compilers are better at getting rid of unnecessary code, often defeating the smart null loop  So now the equation looks like: T CUT = (T CUT_loop – 0 ) / n  So the remaining loop overhead time gets included in the time of the Code Under Test making it look worse than before  One effect we notice is that sometimes a benchmark suite reports slower times for code even though we know we have improved our optimizations!  What’s happening?  The Null Timing loops of benchmark suites attempt to defeat compiler optimizations that skew their results  Compilers are better at getting rid of unnecessary code, often defeating the smart null loop  So now the equation looks like: T CUT = (T CUT_loop – 0 ) / n  So the remaining loop overhead time gets included in the time of the Code Under Test making it look worse than before

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Performing Benchmarks  One other effect we observe is that benchmarks often don’t do anything with the results they calculate  Compilers can detect this and conclude that running the code has no effect and (very importantly) no side-effects  Range propagation concludes that overflow cannot be raised  Result is never used  Code is thrown away  A good example is the Henessey Benchmark in the PIWG suite  Large matrix multiplications, using a range of values that will not result in overflow  Apex 4.0 reports zero time for that test  One other effect we observe is that benchmarks often don’t do anything with the results they calculate  Compilers can detect this and conclude that running the code has no effect and (very importantly) no side-effects  Range propagation concludes that overflow cannot be raised  Result is never used  Code is thrown away  A good example is the Henessey Benchmark in the PIWG suite  Large matrix multiplications, using a range of values that will not result in overflow  Apex 4.0 reports zero time for that test

Solid Palette Gradient Palette I Gradient Palette II APPLYING THESE COLORS Click on the desired color Click on the paintbrush tool located on your toolbar Click on the object you want to colorize Helpful tip: Double click the paintbrush tool to apply color to more than one object at a time. Do not use Gradient or transparent fills for slides to be used on PlaceWare.com To use or remove these color palettes, go to View/Master/Slide Master Optional logo for your notes/handouts slides Performing Benchmarks  When trying to compare different compiler technologies you need to look beyond the results printed by a benchmark program  Printed numbers can be very misleading  Look at absolute times and iteration counts  Benchmarks don’t translate well b/n processor variants and processor types  The best benchmark is your application  Or a sizable portion of it  When trying to compare different compiler technologies you need to look beyond the results printed by a benchmark program  Printed numbers can be very misleading  Look at absolute times and iteration counts  Benchmarks don’t translate well b/n processor variants and processor types  The best benchmark is your application  Or a sizable portion of it