Visual C++ 2005 New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation.

Slides:



Advertisements
Similar presentations
OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Advertisements

Systems and Technology Group © 2006 IBM Corporation Cell Programming Tutorial - JHD24 May 2006 Cell Programming Tutorial Jeff Derby, Senior Technical Staff.
What's new in Microsoft Visual C Preview
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Automatic Parallelization Nick Johnson COS 597c Parallelism 30 Nov
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Starting Parallel Algorithm Design David Monismith Based on notes from Introduction to Parallel Programming 2 nd Edition by Grama, Gupta, Karypis, and.
Advanced microprocessor optimization Kampala August, 2007 Agner Fog
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
PROFILE GUIDED OPTIMIZATION ( ) ANKIT ASTHANA PROGRAM MANAGER POG.
Homework Any Questions?. Statements / Blocks, Section 3.1 An expression becomes a statement when it is followed by a semicolon x = 0; Braces are used.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
1 Lecture 6 Performance Measurement and Improvement.
1 1 Lecture 4 Structure – Array, Records and Alignment Memory- How to allocate memory to speed up operation Structure – Array, Records and Alignment Memory-
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
© 2002 IBM Corporation IBM Toronto Software Lab October 6, 2004 | CASCON2004 Interprocedural Strength Reduction Shimin Cui Roch Archambault Raul Silvera.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation.
Intermediate Code. Local Optimizations
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
Why C++? Isn’t C# enough? Kate Gregory Gregory Consulting.
COP4020 Programming Languages
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
Programming with Shared Memory Introduction to OpenMP
Parallel Programming in Java with Shared Memory Directives.
1 Tips and Tricks: Visual C Optimization Best Practices Kang Su Gatlin TLNL04 Program Manager Visual C++ Microsoft Corporation.
1 Day 1 Module 2:. 2 Use key compiler optimization switches Upon completion of this module, you will be able to: Optimize software for the architecture.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
Java Virtual Machine Case Study on the Design of JikesRVM.
Performance Optimization Getting your programs to run faster CS 691.
Hash Tables1   © 2010 Goodrich, Tamassia.
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
CINT C++ Interpreter update ROOT2001 at Fermi-Lab Masaharu Goto.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Performance Optimization Getting your programs to run faster.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
OpenCL Programming James Perry EPCC The University of Edinburgh.
Profile Guided Optimizations in Visual C Andrew Pardoe Phoenix Team (C++ Optimizer)
1 Announcements  Homework 4 out today  Dec 7 th is the last day you can turn in Lab 4 and HW4, so plan ahead.
1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)
Optimization of C Code The C for Speed
 In the java programming language, a keyword is one of 50 reserved words which have a predefined meaning in the language; because of this,
Auto-Vectorization Jim Hogg Program Manager Visual C++ Compiler Microsoft Corporation.
1 Performance Issues CIS*2450 Advanced Programming Concepts.
Single Node Optimization Computational Astrophysics.
Projections - A Step by Step Tutorial By Chee Wai Lee For the 2004 Charm++ Workshop.
CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
Tuning Threaded Code with Intel® Parallel Amplifier.
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Project CS 116 Section 4 Deadline 04/28 11:59PM Points: 12.
Visual C++ Optimizations Jonathan Caves Principal Software Engineer Visual C++ Microsoft Corporation.
 It is a pure oops language and a high level language.  It was developed at sun microsystems by James Gosling.
July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.
High-level optimization Jakub Yaghob
Introduction to OpenMP
Introduction to OpenMP
White-Box Testing.
White-Box Testing.
Homework Any Questions?.
Introduction to OpenMP
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

Visual C New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation

How can your application run faster? ► Maximize optimization for each file. ► Whole Program Optimization (WPO) goes beyond individual files. ► Profile Guided Optimization (PGO) specializes optimizations specifically for your application. ► New Floating Point Model. ► OpenMP ► 64bit Code Generation.

Maximum Optimization for Each File ► Compiler optimizes each source code file to get best runtime performance  The only type optimization available in Visual C++ 6 ► Visual C has better optimization algorithms  Specialized support for newer processors such as Pentium 4  Improved speed and better precision of floating point operations  New optimization techniques like loop unrolling

Whole Program Opitmization ► Typically Visual C++ will optimize programs by generating code for object files separately ► Introducing whole program optimization  First introduced with Visual C and has since improved  Compiler and linker set with new options (/GL and /LTCG)  Compiler has freedom to do additional optimizations ► Cross-module inlining ► Custom calling conventions  Visual C supports this on all platforms  Whole program optimizations is widely used for Microsoft products.

Profile Guided Optimization ► Static analysis leaves many open optimization questions for the compiler, leading to conservative optimizations ► Visual C++ programs can be tuned for expected user scenarios by collecting information from running application ► Introducing profile guided optimization  Optimizing code by using program in a way how its customer use it  Runs optimizations at link time like whole program optimization  Available in Visual Studio 2005  Widely adopted in Microsoft if (p != NULL) { /* Perform action with p */ } else { /* Error code */ } Is it common for p to be NULL? If it is not common for p to be NULL, the error code should be collected with other infrequently used code

PGO: Instrumentation ► We instrument with “probes” inserted into the code ► Two main types of probes  Value probes ► Used to construct histogram of values  Count (simple/entry) probes ► Used to count number of times a path is taken ► We try to insert the minimum number of probes to get full coverage  Minimizes the cost of instrumentation

PGO Optimizations ► Switch expansion ► Better inlining decisions ► Cold code separation ► Virtual call speculation ► Partial inlining

Compile with /GL & Optimizations On (e.g. /O2) Source Object files Instrumented Image Scenarios Output Profile data Object files Link with /LTCG:PGI Instrumented Image Profile data Object files Link with /LTCG:PGO Optimized Image Profile Guided Optimization

PGO: Inlining Sample ► Profile Guided uses call graph path profiling. foo bat barbaza

PGO: Inlining Sample (Cont) 100 foo bat 2050 barbaz 15 bar baz ► Profile Guided uses call graph path profiling. a 1075 bar baz 15

PGO – Inlining Sample (cont) foo bat barbaz barbaz ► Inlining decisions are made at each call site. a 10 15

PGO – Switch Expansion if (i == 10) goto default; switch (i) { case 1: … case 2: … case 3: … default:… } Most frequent values are pulled out. switch (i) { case 1: … case 2: … case 3: … default:… } // 90% of the // time i = 10; ►

PGO – Code Separation A CB D A B C D Default layout A B C D Optimized layout Basic blocks are ordered so that most frequent path falls through.

PGO – Virtual Call Speculation class Foo:Base{ … void call(); } class Bar:Base { … void call(); } class Base{ … virtual void call(); } void Bar(Base *A) { … while(true) { … A->call(); … } void Func(Base *A) { … while(true) { … if(type(A) == Foo:Base) { // inline of A->call(); } else A->call(); … } The type of object A in function Func was almost always Foo via the profiles

PGO – Partial Inlining Basic Block 1 Cond Cold CodeHot Code More Code

PGO – Partial Inlining (cont) Basic Block 1 Cond Cold CodeHot Code More Code Hot path is inlined, but NOT the cold

Demo Optimizing applications with VC

New Floating Point Model ► /Op made your code run slow  No intermediate switch ► New Floating Point Model  /fp:fast  /fp:precise (default)  /fp:strict  /fp:except

/fp:precise ► The default floating point switch ► Performance and Precision ► IEEE Conformant ► Round to the appropriate precision  At assignments, casts and function calls

/fp:fast ► When performance matters most ► You know your application does simple floating point operations ► What can /fp:fast do?  Association  Distribution  Factoring inverse  Scalar reduction  Copy propagation  And others …

/fp:except ► Reliable floating point exceptions ► Thrown and not thrown when expected  Faults and traps, when reliable, should occur at the line that causes the exception  FWAITs on x86 might be added ► Cannot be used with /fp:fast and in managed code

/fp:strict ► The strictest FP option  Turns off contractions  Assumes floating point control word can change or that the user will examine flags ► /fp:except is implied ► Low double digit percent slowdown versus /fp:fast

What is the output? #include #include int main() { double x, y, z; double sum; x = 1e20; y = -1e20; z = 10.0; sum = x + y + z; printf ("sum=%f\n",sum); } / fp:fast /O2 = /fp:strict /O2 = 10.0

OpenMP  A specification for writing multithreaded programs  It consists of a set of simple #pragmas and runtime routines  Makes it very easy to parallelize loop-based code  Helps with load balancing, synchronization, etc…  In Visual Studio, only available in C++

OpenMP Parallelization ► Can parallelize loops and straight-line code ► Includes synchronization constructs first = 1 last = ≤ i ≤ ≤ i ≤ ≤ i ≤ ≤ i ≤ 1000 void test(int first, int last) { #pragma omp parallel for for (int i = first; i <= last; ++i) { a[i] = b[i] + c[i]; }

64bit Compiler in VC2005 ► 64bit Compiler Cross Tools  Compiler is 32bit but resulting image is 64bit ► 64bit Compiler Native Tools  Compiler and resulting image are 64bit binaries. ► All previous optimizations apply for 64bit as well.

Resources ► Visual C++ Dev Center   This is the place to go for all our news and whitepapers  Also VC2005 specific forums at ► Myself 