Performance and Code Tuning Overview

Slides:



Advertisements
Similar presentations
Code Tuning Strategies and Techniques
Advertisements

When and How to Improve Code Performance? Ivaylo Bratoev Telerik Corporation
Code Tuning Strategies and Techniques CS524 – Software Engineering Azusa Pacific University Dr. Sheldon X. Liang Mike Rickman.
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
1 Lecture 6 Performance Measurement and Improvement.
COMP205 Comparative Programming Languages Part 1: Introduction to programming languages Lecture 3: Managing and reducing complexity, program processing.
Chapter 4 Assessing and Understanding Performance
27-Jun-15 Profiling code, Timing Methods. Optimization Optimization is the process of making a program as fast (or as small) as possible Here’s what the.
30-Jun-15 Profiling. Optimization Optimization is the process of making a program as fast (or as small) as possible Here’s what the experts say about.
Performance and Code Tuning Overview CPSC 315 – Programming Studio Fall 2009.
High level & Low level language High level programming languages are more structured, are closer to spoken language and are more intuitive than low level.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
Abstraction IS 101Y/CMSC 101 Computational Thinking and Design Tuesday, September 17, 2013 Marie desJardins University of Maryland, Baltimore County.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
Analysis of algorithms Analysis of algorithms is the branch of computer science that studies the performance of algorithms, especially their run time.
Chapter 25: Code-Tuning Strategies. Chapter 25  Code tuning is one way of improving a program’s performance, You can often find other ways to improve.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Problem Solving using the Science of Computing MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
CS 3500 L Performance l Code Complete 2 – Chapters 25/26 and Chapter 7 of K&P l Compare today to 44 years ago – The Burroughs B1700 – circa 1974.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
Software Development. Software Development Loop Design  Programmers need a solid foundation before they start coding anything  Understand the task.
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
The single most important skill for a computer programmer is problem solving Problem solving means the ability to formulate problems, think creatively.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Measuring Performance Based on slides by Henri Casanova.
Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.
CSC 108H: Introduction to Computer Programming Summer 2012 Marek Janicki.
CSC 108H: Introduction to Computer Programming Summer 2011 Marek Janicki.
Performance. Moore's Law Moore's Law Related Curves.
A computational ecosystem for near real-time satellite data processing
Why don’t programmers have to program in machine code?
Software Development.
Course Contents KIIT UNIVERSITY Sr # Major and Detailed Coverage Area
High or Low Level Programming Language? Justify your decision.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Compiler, Interpreter, and Bootstrapping
John D. McGregor Session 9 Testing Vocabulary
Algorithmic complexity: Speed of algorithms
High Performance Computing on an IBM Cell Processor --- Bioinformatics
CS101 Introduction to Computing Lecture 19 Programming Languages
Lecture 3 of Computer Science II
John D. McGregor Session 9 Testing Vocabulary
Lecture 5: GPU Compute Architecture
Performance and Code Tuning
John D. McGregor Session 9 Testing Vocabulary
Compiler Construction
Objective of This Course
P A R A L L E L C O M P U T I N G L A B O R A T O R Y
Programming Languages
UNIT 3 CHAPTER 1 LESSON 4 Using Simple Commands.
Winter 2018 CISC101 12/2/2018 CISC101 Reminders
Algorithmic complexity: Speed of algorithms
COMP60621 Fundamentals of Parallel and Distributed Systems
Operating Systems Lecture 3.
Asst. Dr.Surasak Mungsing
Algorithms and Problem Solving
CSE 373 Data Structures and Algorithms
Multithreading Why & How.
Tonga Institute of Higher Education IT 141: Information Systems
Algorithmic complexity: Speed of algorithms
Tonga Institute of Higher Education IT 141: Information Systems
Course Description Algorithms are: Recipes for solving problems.
COMP60611 Fundamentals of Parallel and Distributed Systems
Performance.
Programming language translators
Algorithm Analysis How can we demonstrate that one algorithm is superior to another without being misled by any of the following problems: Special cases.
Presentation transcript:

Performance and Code Tuning Overview CPSC 315 – Programming Studio Spring 2017

Is Performance Important? Performance tends to improve with time HW Improvements Other things can be more important Accuracy Robustness Code Readability Worrying about it can cause problems “More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason – including blind stupidity.” – William A. Wulf

So Why Worry About It? Sometimes code performance is critical Very large-scale problems Inefficiencies that make things seem infeasible The gains in hardware improvement may be coming to an end Or, at least need more tricks to take advantage of “Performance Engineering” will likely be important in making long-term improvements

So, How Can We Improve Performance? First, there are ways to improve performance that don’t involve code tuning Before doing code tuning, you need to know what to tune Finally, you can tune your code Carefully, measuring along the way

Performance Increases without Code Tuning Lower your Standards/Requirements (!!) Asking for more than is needed leads to trouble Example: Return in 1 second Always? On Average? 99% of the time?

Performance Increases without Code Tuning Lower your Standards/Requirements High Level Design The overall program structure can play a huge role

Performance Increases without Code Tuning Lower your Standards/Requirements High Level Design Class/Routine Design Algorithms used have real differences Can have largest effect, especially asymptotically

Performance Increases without Code Tuning Lower your Standards/Requirements High Level Design Class/Routine Design Interactions with Operating System Hidden OS calls within libraries – their performance affects overall code

Performance Increases without Code Tuning Lower your Standards/Requirements High Level Design Class/Routine Design Interactions with Operating System Compiler Optimizations “Automatic” Optimization Getting better and better, but not perfect Different compilers work better/worse

Performance Increases without Code Tuning Lower your Standards/Requirements High Level Design Class/Routine Design Interactions with Operating System Compiler Optimizations Upgrade Hardware Straightforward, if possible…

Code Profiling Determine where code is spending time No sense in optimizing where no time is spent Provide measurement basis Determine whether “improvement” really improved anything Need to take precise measurements

Profiling Techniques Profiler – compile with profiling options, and run through profiler Gets list of functions/routines, and amount of time spent in each Use system timer Less ideal Might need test harness for functions Use system-supported real time Only slightly better than wristwatch… Graph results for understanding Multiple profile results: see how profile changes for different input types

What Is Tuning? Making small-scale adjustments to correct code in order to improve performance After code is written and working Affects only small-scale: a few lines, or at most one routine Examples: adjusting details of loops, expressions Code tuning can sometimes improve code efficiency tremendously

What Tuning is Not Reducing lines of code Not an indicator of efficient code A guess at what might improve things Know what you’re trying, and measure results Optimizing as you go Wait until finished, then go back to improve Optimizing while programming is often a waste A “first choice” for improvement Worry about other details/design first

Common Inefficiencies Unnecessary I/O operations File access especially slow Paging/Memory issues Can vary by system System Calls Interpreted Languages C/C++/VB tend to be “best” Java about 1.5 times slower PHP/Python about 100 times slower

Operation Costs Different operations take different times Integer division longer than other ops Transcendental functions (sin, sqrt, etc.) even longer Knowing this can help when tuning Vary by language In C++, private routine calls take about twice the time of an integer op, and in Java about half the time.

An Example Stealing an example from Charles Leiserson’s Distinguised Lecture on Performance Engineering February 8, 2017 Joint Work with Bradley Kuszmaul and Tao Schardl. Performance Engineering a 4K x 4K Matrix Multiplication Machine: Dual-Socket Intel Xeon E5-2666 v3 (Haswell) 18 cores 2.9 GHz 60 GB of DRAM

Performance Engineering: 4Kx4K Matrix Multipication Implementation Time (seconds) Straightforward Python Implementation ??

Performance Engineering: 4Kx4K Matrix Multipication Implementation Time (seconds) Straightforward Python Implementation 25,552.48 (>7 hours!) ????

Performance Engineering: 4Kx4K Matrix Multipication Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation ?

Performance Engineering: 4Kx4K Matrix Multipication Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation 2372.68 (almost 40 min)

Performance Engineering: 4Kx4K Matrix Multipication Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation 2372.68 Straightforward C Implementation 542.67

Performance Engineering: 4Kx4K Matrix Multipication Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation 2372.68 Straightforward C Implementation 542.67 Parallel Loops 69.80

Performance Engineering: 4Kx4K Matrix Multipication Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation 2372.68 Straightforward C Implementation 542.67 Parallel Loops 69.80 Parallel Divide and Conquer 3.80

Performance Engineering: 4Kx4K Matrix Multipication Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation 2372.68 Straightforward C Implementation 542.67 Parallel Loops 69.80 Parallel Divide and Conquer 3.80 add Vectorization (streaming the parallel operations) 1.10

Performance Engineering: 4Kx4K Matrix Multipication Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation 2372.68 Straightforward C Implementation 542.67 Parallel Loops 69.80 Parallel Divide and Conquer 3.80 add Vectorization (streaming the parallel operations) 1.10 add AVX intrinsics (processor commands for vector operations) 0.41

Performance Engineering: 4Kx4K Matrix Multipication Implementation Time (seconds) Straightforward Python Implementation 25,552.48 Straightforward Java Implementation 2372.68 Straightforward C Implementation 542.67 Parallel Loops 69.80 Parallel Divide and Conquer 3.80 add Vectorization (streaming the parallel operations) 1.10 add AVX intrinsics (processor commands for vector operations) 0.41 Strassen’s algorithm 0.38

The Lesson from this Example Code “performance engineering” can make incredible improvements in performance. 67,243 times faster in this case!

Remember Code readability/maintainability/etc. is usually more important than efficiency Always start with well-written code, and only tune at the end Measure!