Visual C++ Optimizations Jonathan Caves Principal Software Engineer Visual C++ Microsoft Corporation.

Slides:



Advertisements
Similar presentations
OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Advertisements

Systems and Technology Group © 2006 IBM Corporation Cell Programming Tutorial - JHD24 May 2006 Cell Programming Tutorial Jeff Derby, Senior Technical Staff.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Advanced microprocessor optimization Kampala August, 2007 Agner Fog
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Chapter 7: User-Defined Functions II
PROFILE GUIDED OPTIMIZATION ( ) ANKIT ASTHANA PROGRAM MANAGER POG.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
1 Lecture 6 Performance Measurement and Improvement.
1 1 Lecture 4 Structure – Array, Records and Alignment Memory- How to allocate memory to speed up operation Structure – Array, Records and Alignment Memory-
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
© 2002 IBM Corporation IBM Toronto Software Lab October 6, 2004 | CASCON2004 Interprocedural Strength Reduction Shimin Cui Roch Archambault Raul Silvera.
Chapter 11 - Monitoring Server Performance1 Ch. 11 – Monitoring Server Performance MIS 431 – created Spring 2006.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation.
Intermediate Code. Local Optimizations
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
Why C++? Isn’t C# enough? Kate Gregory Gregory Consulting.
1CMSC 345, Version 4/04 Verification and Validation Reference: Software Engineering, Ian Sommerville, 6th edition, Chapter 19.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
Programming with Shared Memory Introduction to OpenMP
1 Tips and Tricks: Visual C Optimization Best Practices Kang Su Gatlin TLNL04 Program Manager Visual C++ Microsoft Corporation.
1 Day 1 Module 2:. 2 Use key compiler optimization switches Upon completion of this module, you will be able to: Optimize software for the architecture.
Visual C New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Ultra sound solution Impact of C++ DSP optimization techniques.
CCS APPS CODE COVERAGE. CCS APPS Code Coverage Definition: –The amount of code within a program that is exercised Uses: –Important for discovering code.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Copyright © 2002 W. A. Tucker1 Chapter 7 Lecture Notes Bill Tucker Austin Community College COSC 1315.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
Java Virtual Machine Case Study on the Design of JikesRVM.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Tips & Tricks: Scrubbing Source Code For Common Coding Mistakes (FxCop And PREfast) Nicholas Guerrera TLNL06 Software Design Engineer Microsoft Corporation.
Hash Tables1   © 2010 Goodrich, Tamassia.
16 October Reminder Types of Testing: Purpose  Functional testing  Usability testing  Conformance testing  Performance testing  Acceptance.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
CS 2130 Lecture 5 Storage Classes Scope. C Programming C is not just another programming language C was designed for systems programming like writing.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
® IBM Software Group © 2006 IBM Corporation PurifyPlus on Linux / Unix Vinay Kumar H S.
© 2004 Goodrich, Tamassia Hash Tables1  
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Profile Guided Optimizations in Visual C Andrew Pardoe Phoenix Team (C++ Optimizer)
Buffer Overflow Attack Proofing of Code Binary Gopal Gupta, Parag Doshi, R. Reghuramalingam, Doug Harris The University of Texas at Dallas.
1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)
Optimization of C Code The C for Speed
 In the java programming language, a keyword is one of 50 reserved words which have a predefined meaning in the language; because of this,
How to execute Program structure Variables name, keywords, binding, scope, lifetime Data types – type system – primitives, strings, arrays, hashes – pointers/references.
1 Performance Issues CIS*2450 Advanced Programming Concepts.
Programming for Performance CS 740 Oct. 4, 2000 Topics How architecture impacts your programs How (and how not) to tune your code.
Single Node Optimization Computational Astrophysics.
© Dr. A. Williams, Fall Present Software Quality Assurance – Clover Lab 1 Tutorial / lab 2: Code instrumentation Goals of this session: 1.Create.
Tuning Threaded Code with Intel® Parallel Amplifier.
Boris Jabes Program Manager Visual C++ Microsoft Corporation.
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
@Yuan Xue Worm Attack Yuan Xue Fall 2012.
Project CS 116 Section 4 Deadline 04/28 11:59PM Points: 12.
 It is a pure oops language and a high level language.  It was developed at sun microsystems by James Gosling.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Code Optimization.
Introduction to OpenMP
Introduction to OpenMP
Improving software quality using Visual Studio 11 C++ Code Analysis
Introduction to OpenMP
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

Visual C++ Optimizations Jonathan Caves Principal Software Engineer Visual C++ Microsoft Corporation

How can your application run faster? ► Maximize optimization for each file. ► Whole Program Optimization (WPO) goes beyond individual files. ► Profile Guided Optimization (PGO) specializes optimizations specifically for your application. ► New Floating Point Model. ► OpenMP ► 64bit Code Generation.

Maximum Optimization for Each File ► Compiler optimizes each source code file to get best runtime performance  The only type optimization available in Visual C++ 6 ► Visual C added better optimization algorithms  Specialized support for newer processors such as Pentium 4  Improved speed and better precision of floating point operations  New optimization techniques like loop unrolling ► Typical expectation for performance after rebuild  10-20% improvement from Visual C++ 6 to Visual C  20-30% improvement from Visual C++ 6 to Visual C

Whole Program Optimization ► Typically Visual C++ will optimize programs by generating code for object files separately ► Introducing whole program optimization  First introduced with Visual C and has since improved  Compiler and linker set with new options (/GL and /LTCG)  Compiler has freedom to do additional optimizations ► Cross-module inlining ► Custom calling conventions  Visual C supports this on all platforms  Whole program optimizations is widely used for Microsoft products such as SQL Server ► Typically expect significant performance improvement  About 30% improvement from Visual C to Visual C

Profile Guided Optimization ► Static analysis leaves many open optimization questions for the compiler, leading to conservative optimizations ► Visual C++ programs can be tuned for expected user scenarios by collecting information from running application ► Introducing profile guided optimization  Optimizing code by using program in a way how its customer use it  Runs optimizations at link time like whole program optimization  Available in Visual Studio 2005  Widely adopted in Microsoft if (p != NULL) { /* Perform action with p */ } else { /* Error code */ } Is it common for p to be NULL? If it is not common for p to be NULL, the error code should be collected with other infrequently used code

PGO: Instrumentation ► We instrument with “probes” inserted into the code ► Two main types of probes  Value probes ► Used to construct histogram of values  Count (simple/entry) probes ► Used to count number of times a path is taken ► We try to insert the minimum number of probes to get full coverage  Minimizes the cost of instrumentation

PGO Optimizations ► Switch expansion ► Better inlining decisions ► Cold code separation ► Virtual call speculation ► Partial inlining

Compile with /GL & Optimizations On (e.g. /O2) Source Object files Instrumented Image Scenarios Output Profile data Object files Link with /LTCG:PGI Instrumented Image Profile data Object files Link with /LTCG:PGO Optimized Image Profile Guided Optimization

PGO: Inlining Sample ► Profile Guided uses call graph path profiling. foo bat barbaza

PGO: Inlining Sample (Cont) 100 foo bat 2050 barbaz 15 bar baz ► Profile Guided uses call graph path profiling. a 1075 bar baz 15

PGO – Inlining Sample (cont) foo bat barbaz barbaz ► Inlining decisions are made at each call site. a 10 15

PGO – Switch Expansion if (i == 10) goto default; switch (i) { case 1: … case 2: … case 3: … default:… } Most frequent values are pulled out. switch (i) { case 1: … case 2: … case 3: … default:… } // 90% of the // time i = 10; ►

PGO – Code Separation A CB D A B C D Default layout A B C D Optimized layout Basic blocks are ordered so that most frequent path falls through.

PGO – Virtual Call Speculation class Foo:Base{ … void call(); } class Bar:Base { … void call(); } class Base{ … virtual void call(); } void Func(Base *A) { … while(true) { … A->call(); … } void Func(Base *A) { … while(true) { … if(type(A) == Foo:Base) { // inline of A->call(); } else A->call(); … } The type of object A in function Func was almost always Foo via the profiles

PGO – Partial Inlining Basic Block 1 Cond Cold CodeHot Code More Code

PGO – Partial Inlining (cont) Basic Block 1 Cond Cold CodeHot Code More Code Hot path is inlined, but NOT the cold

Demo Optimizing applications with Visual C++

New Floating Point Model ► /Op made your code run slow  No intermediate switch ► New Floating Point Model  /fp:fast  /fp:precise (default)  /fp:strict  /fp:except

/fp:precise ► The default floating point switch ► Performance and Precision ► IEEE Conformant ► Round to the appropriate precision  At assignments, casts and function calls

/fp:fast ► When performance matters most ► You know your application does simple floating point operations ► What can /fp:fast do?  Association  Distribution  Factoring inverse  Scalar reduction  Copy propagation  And others …

/fp:except ► Reliable floating point exceptions ► Thrown and not thrown when expected  Faults and traps, when reliable, should occur at the line that causes the exception  FWAITs on x86 might be added ► Cannot be used with /fp:fast and in managed code

/fp:strict ► The strictest FP option  Turns off contractions  Assumes floating point control word can change or that the user will examine flags ► /fp:except is implied ► Low double digit percent slowdown versus /fp:fast

What is the output? #include #include int main() { double x, y, z; double sum; x = 1e20; y = -1e20; z = 10.0; sum = x + y + z; printf ("sum=%f\n",sum); } / fp:fast /O2 = o.ooo /fp:strict /O2 = 10.0

OpenMP  A specification for writing multithreaded programs  It consists of a set of simple #pragmas and runtime routines  Makes it very easy to parallelize loop-based code  Helps with load balancing, synchronization, etc…  In Visual Studio, only available in C++

OpenMP Parallelization ► Can parallelize loops and straight-line code ► Includes synchronization constructs first = 1 last = ≤ i ≤ ≤ i ≤ ≤ i ≤ ≤ i ≤ 1000 void test(int first, int last) { #pragma omp parallel for for (int i = first; i <= last; ++i) { a[i] = b[i] + c[i]; }

64bit Compilers ► 64bit Compiler Cross Tools  Compiler is 32bit but resulting image is 64bit ► 64bit Compiler Native Tools  Compiler and resulting image are 64bit binaries. ► All previous optimizations apply for 64bit as well.

27 Understanding of Your Source Code ► Visual Studio Team System 2005 provides tools that help you understand defects and behavior of your source code ► Static code analysis  Finds defects in source code at build time ► Profiler  Determines where application spends time ► Code coverage  Verifies that code paths are used as expected

28 Static Code Analysis ► Static code analysis helps developers find defects in code (/analyze)  Reports code defects  Warns about possible security vulnerabilities  Suggests ways to improve performance  Identifies possible design issues  Enforces best practices ► Warns about defects and displays path to a problem void vulnerable(char* p) { wchar_t buf[16]; int ret; ret = MultiByteToWideChar(CP_ACP, 0, p, -1, buf, sizeof(buf)); printf("%d\n", ret); } Do you see the buffer overrun? This caused Code Red.

29 char *name = new char[10]; if(x < n) return ERR_CODE; delete name;.EXE Intermediate Representation Static Code Analysis Code Analysis

30 DefectsSecurityDesignPolicyPerformance char *name = new char[10]; if(x < n) return ERR_CODE; delete name; Potential Memory Leak! Defect Detection

31 DefectsSecurityDesignPolicyPerformance class Buffer { char buffer[10]; public: void* Fill(int value, int fillCount) { while (--fillCount) buffer[fillCount] = value; *buffer = value; return buffer; } }; Integer Overflow Error Security Defect Detection

32 Profiler ► Examine performance for entire application or for its specific parts ► Helps to find runtime bottlenecks of programs  Option of collecting information via sampling or instrumentation  Collect up to 15 performance counters ► Significantly better than profiler in Visual C++ 6

Resources ► Visual C++ Dev Center   This is the place to go for all our news and whitepapers  Also VC2005 specific forums at ► Myself 