Performance and Code Tuning

Slides:



Advertisements
Similar presentations
Code Tuning Strategies and Techniques
Advertisements

Code Tuning Strategies and Techniques CS524 – Software Engineering Azusa Pacific University Dr. Sheldon X. Liang Mike Rickman.
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Dr. Ken Hoganson, © August 2014 Programming in R COURSE NOTES 2 Hoganson Language Translation.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Copyright © 2003, SAS Institute Inc. All rights reserved. Where's Waldo Uncovering Hard-to-Find Application Killers Claire Cates SAS Institute, Inc
1 Lecture 6 Performance Measurement and Improvement.
Reference Book: Modern Compiler Design by Grune, Bal, Jacobs and Langendoen Wiley 2000.
1 Programming & Programming Languages Overview l Machine operations and machine language. l Example of machine language. l Different types of processor.
27-Jun-15 Profiling code, Timing Methods. Optimization Optimization is the process of making a program as fast (or as small) as possible Here’s what the.
30-Jun-15 Profiling. Optimization Optimization is the process of making a program as fast (or as small) as possible Here’s what the experts say about.
Performance and Code Tuning Overview CPSC 315 – Programming Studio Fall 2009.
1 CS101 Introduction to Computing Lecture 19 Programming Languages.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
High level & Low level language High level programming languages are more structured, are closer to spoken language and are more intuitive than low level.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
CS101 Introduction to Computing Lecture Programming Languages.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Chapter 25: Code-Tuning Strategies. Chapter 25  Code tuning is one way of improving a program’s performance, You can often find other ways to improve.
Problem Solving using the Science of Computing MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
CS 3500 L Performance l Code Complete 2 – Chapters 25/26 and Chapter 7 of K&P l Compare today to 44 years ago – The Burroughs B1700 – circa 1974.
Copyright © Curt Hill Parallelism in Processors Several Approaches.
CSC 322 Operating Systems Concepts Lecture - 7: by Ahmed Mumtaz Mustehsan Special Thanks To: Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
Measuring Performance Based on slides by Henri Casanova.
CSC 108H: Introduction to Computer Programming Summer 2011 Marek Janicki.
Threads prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University 1July 2016Processes.
Why don’t programmers have to program in machine code?
Advanced Computer Systems
Component 1.6.
Algorithms and Problem Solving
Code Optimization.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Algorithmic complexity: Speed of algorithms
IS301 – Software Engineering Dept of Computer Information Systems
CS101 Introduction to Computing Lecture 19 Programming Languages
Lecture 3 of Computer Science II
Optimization Code Optimization ©SoftMoore Consulting.
Lecture 5: GPU Compute Architecture
Some Basics for Problem Analysis and Solutions
MapReduce Simplied Data Processing on Large Clusters
Compiler Construction
Optimizing Malloc and Free
Lecture 5: GPU Compute Architecture for the last time
Objective of This Course
Programming Languages
UNIT 3 CHAPTER 1 LESSON 4 Using Simple Commands.
CSCE 313 – Introduction to UNIx process
Algorithmic complexity: Speed of algorithms
COMP60621 Fundamentals of Parallel and Distributed Systems
Algorithms and Problem Solving
Programming with Shared Memory Specifying parallelism
PROGRAMMING FUNDAMENTALS Lecture # 03. Programming Language A Programming language used to write computer programs. Its mean of communication between.
Lecture 2 The Art of Concurrency
Tonga Institute of Higher Education IT 141: Information Systems
Introduction to Primitives
Algorithmic complexity: Speed of algorithms
Introduction to Primitives
CSC3050 – Computer Architecture
Tonga Institute of Higher Education IT 141: Information Systems
Memory System Performance Chapter 3
COMP60611 Fundamentals of Parallel and Distributed Systems
Performance and Code Tuning Overview
Performance.
Lecture 11: Machine-Dependent Optimization
Programming language translators
Functions, Procedures, and Abstraction
Presentation transcript:

Performance and Code Tuning CSCE 315 – Programming Studio, Fall 2017 Tanzir Ahmed

Is Performance Important? Performance tends to improve with time HW Improvements Other things can be more important Accuracy Robustness Code Readability Worrying about it can cause problems “More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason – including blind stupidity.” – William A. Wulf

So Why Worry About It? Sometimes code performance is critical Very large-scale problems Inefficiencies that make things seem infeasible The gains in hardware improvement may be coming to an end Or, at least need more tricks to take advantage of It is not straight-forward to take full advantage of a 48-core or 1024 CPUs/GPUs Problems such as inter-core or inter-socket communication overhead, cache efficiency “Performance Engineering” will likely be important in making long-term improvements

So, How Can We Improve Performance? First, there are ways to improve performance that don’t involve code tuning Before doing code tuning, you need to know what to tune Again, guess work is not helpful Finally, you can tune your code Carefully, measuring along the way

Performance Increases without Code Tuning Lower your Standards/Requirements (!!) Performance tuning is expensive May require h/w s/w optimization Performance is stated as a requirement far more than it is actually a requirement

Performance Increases without Code Tuning Lower your Standards/Requirements High Level Design The overall program structure can play a huge role

Performance Increases without Code Tuning Lower your Standards/Requirements High Level Design Class/Routine Design Algorithms used have real differences Can have largest effect, especially asymptotically

Performance Increases without Code Tuning Lower your Standards/Requirements High Level Design Class/Routine Design Interactions with Operating System Hidden OS calls within libraries – their performance affects overall code Just removing repeating printing to standard output may increase performance significantly for some programs

Performance Increases without Code Tuning Lower your Standards/Requirements High Level Design Class/Routine Design Interactions with Operating System Compiler Optimizations “Automatic” Optimization Getting better and better, but not perfect Different compilers work differently

Performance Increases without Code Tuning Lower your Standards/Requirements High Level Design Class/Routine Design Interactions with Operating System Compiler Optimizations Upgrade Hardware Straightforward, if possible…

Code Profiling Pareto Rule Determine where code is spending time More than 80% of the time is spent on less than 20% of code In reality this proportion is even more (e.g., 5% code contributing to 90%, just an example) Determine where code is spending time No sense in optimizing where no time is spent Provide measurement basis Determine whether “improvement” really improved anything Need to take precise measurements

Profiling Techniques Profiler – compile with profiling options, and run through profiler Gets list of functions/routines, and amount of time spent in each Use system timer (e.g., $time ./a.out) Less ideal Might need test harness for functions Graph results for understanding Multiple profile results: see how profile changes for different input types

What Is Tuning? Making small-scale adjustments to correct code in order to improve performance After code is written and working Affects only small-scale: a few lines, or at most one routine Examples: adjusting details of loops, expressions Code tuning can sometimes improve code efficiency tremendously

What Tuning is Not Reducing lines of code Not an indicator of efficient code A guess at what might improve things Know what you’re trying, and measure results Optimizing as you go Wait until finished, then go back to improve Optimizing while programming is often a waste A “first choice” for improvement Worry about other details/design first It is not Refactoring Refactoring improves code readability and quality, while Tuning often diminishes both

Common Inefficiencies Unnecessary I/O operations File access especially slow Paging/Memory issues Can vary by system System Calls Requires context switch which involves OS and scheduling overhead beyond the program’s control Interpreted Languages Instead of being compiled as a whole, each line is interpreted and converted to machine language individually Table Source: Code Complete book

Operation Costs Different operations take different times Integer division longer than other ops Much slower than bit-shifting to achieve the same Transcendental functions (sin, sqrt, etc.) even longer Knowing this can help when tuning Vary by language In C++, private routine calls take about twice the time of an integer op, and in Java about half the time.

An Example Stealing an example from Charles Leiserson’s Distinguised Lecture on Performance Engineering February 8, 2017 Joint Work with Bradley Kuszmaul and Tao Schardl. Performance Engineering a 4K x 4K Matrix Multiplication Machine: Dual-Socket Intel Xeon E5-2666 v3 (Haswell) 18 cores 2.9 GHz 60 GB of DRAM

Performance Engineering: 4Kx4K Matrix Multiplication Implementation Time (seconds) Straightforward Python Implementation 25,552.48 (> 7 hours)

Performance Engineering: 4Kx4K Matrix Multiplication Implementation Time (seconds) Straightforward Python Implementation 25,552.48 (> 7 hours) Straightforward Java Implementation 2372.68 (~ 40 mins)

Performance Engineering: 4Kx4K Matrix Multiplication Implementation Time (seconds) Straightforward Python Implementation 25,552.48 (> 7 hours) Straightforward Java Implementation 2372.68 (~ 40 mins) Straightforward C Implementation 542.67 (<10 mins)

Performance Engineering: 4Kx4K Matrix Multiplication Implementation Time (seconds) Straightforward Python Implementation 25,552.48 (> 7 hours) Straightforward Java Implementation 2372.68 (~ 40 mins) Straightforward C Implementation 542.67 (<10 mins) Parallel Loops 69.80 (~ 1 min)

Performance Engineering: 4Kx4K Matrix Multiplication Implementation Time (seconds) Straightforward Python Implementation 25,552.48 (> 7 hours) Straightforward Java Implementation 2372.68 (~ 40 mins) Straightforward C Implementation 542.67 (<10 mins) Parallel Loops 69.80 (~ 1 min) Parallel Divide and Conquer 3.80

Performance Engineering: 4Kx4K Matrix Multiplication Implementation Time (seconds) Straightforward Python Implementation 25,552.48 (> 7 hours) Straightforward Java Implementation 2372.68 (~ 40 mins) Straightforward C Implementation 542.67 (<10 mins) Parallel Loops 69.80 (~ 1 min) Parallel Divide and Conquer 3.80 add Vectorization (streaming the parallel operations) 1.10 add AVX intrinsics (processor commands for vector operations) 0.41 Strassen’s algorithm 0.38

The Lesson from this Example Code “performance engineering” can make incredible improvements in performance. 67,243 times faster in this case! It is very rewarding However, this effort is often meaningless and even hurtful when used in “tuning as you go” style during development Tends to decrease code readability and reusability Often it is hard to identify the real bottleneck early on

Remember Code readability/maintainability/etc. is usually more important than efficiency Always start with well-written code, and only tune at the end Measure!