Download presentation
Presentation is loading. Please wait.
1
Performance and Code Tuning
CSCE 315 – Programming Studio, Fall 2017 Tanzir Ahmed
2
Is Performance Important?
Performance tends to improve with time HW Improvements Other things can be more important Accuracy Robustness Code Readability Worrying about it can cause problems “More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason – including blind stupidity.” – William A. Wulf
3
So Why Worry About It? Sometimes code performance is critical
Very large-scale problems Inefficiencies that make things seem infeasible The gains in hardware improvement may be coming to an end Or, at least need more tricks to take advantage of It is not straight-forward to take full advantage of a 48-core or 1024 CPUs/GPUs Problems such as inter-core or inter-socket communication overhead, cache efficiency “Performance Engineering” will likely be important in making long-term improvements
4
So, How Can We Improve Performance?
First, there are ways to improve performance that don’t involve code tuning Before doing code tuning, you need to know what to tune Again, guess work is not helpful Finally, you can tune your code Carefully, measuring along the way
5
Performance Increases without Code Tuning
Lower your Standards/Requirements (!!) Performance tuning is expensive May require h/w s/w optimization Performance is stated as a requirement far more than it is actually a requirement
6
Performance Increases without Code Tuning
Lower your Standards/Requirements High Level Design The overall program structure can play a huge role
7
Performance Increases without Code Tuning
Lower your Standards/Requirements High Level Design Class/Routine Design Algorithms used have real differences Can have largest effect, especially asymptotically
8
Performance Increases without Code Tuning
Lower your Standards/Requirements High Level Design Class/Routine Design Interactions with Operating System Hidden OS calls within libraries – their performance affects overall code Just removing repeating printing to standard output may increase performance significantly for some programs
9
Performance Increases without Code Tuning
Lower your Standards/Requirements High Level Design Class/Routine Design Interactions with Operating System Compiler Optimizations “Automatic” Optimization Getting better and better, but not perfect Different compilers work differently
10
Performance Increases without Code Tuning
Lower your Standards/Requirements High Level Design Class/Routine Design Interactions with Operating System Compiler Optimizations Upgrade Hardware Straightforward, if possible…
11
Code Profiling Pareto Rule Determine where code is spending time
More than 80% of the time is spent on less than 20% of code In reality this proportion is even more (e.g., 5% code contributing to 90%, just an example) Determine where code is spending time No sense in optimizing where no time is spent Provide measurement basis Determine whether “improvement” really improved anything Need to take precise measurements
12
Profiling Techniques Profiler – compile with profiling options, and run through profiler Gets list of functions/routines, and amount of time spent in each Use system timer (e.g., $time ./a.out) Less ideal Might need test harness for functions Graph results for understanding Multiple profile results: see how profile changes for different input types
13
What Is Tuning? Making small-scale adjustments to correct code in order to improve performance After code is written and working Affects only small-scale: a few lines, or at most one routine Examples: adjusting details of loops, expressions Code tuning can sometimes improve code efficiency tremendously
14
What Tuning is Not Reducing lines of code
Not an indicator of efficient code A guess at what might improve things Know what you’re trying, and measure results Optimizing as you go Wait until finished, then go back to improve Optimizing while programming is often a waste A “first choice” for improvement Worry about other details/design first It is not Refactoring Refactoring improves code readability and quality, while Tuning often diminishes both
15
Common Inefficiencies
Unnecessary I/O operations File access especially slow Paging/Memory issues Can vary by system System Calls Requires context switch which involves OS and scheduling overhead beyond the program’s control Interpreted Languages Instead of being compiled as a whole, each line is interpreted and converted to machine language individually Table Source: Code Complete book
16
Operation Costs Different operations take different times
Integer division longer than other ops Much slower than bit-shifting to achieve the same Transcendental functions (sin, sqrt, etc.) even longer Knowing this can help when tuning Vary by language In C++, private routine calls take about twice the time of an integer op, and in Java about half the time.
17
An Example Stealing an example from Charles Leiserson’s Distinguised Lecture on Performance Engineering February 8, 2017 Joint Work with Bradley Kuszmaul and Tao Schardl. Performance Engineering a 4K x 4K Matrix Multiplication Machine: Dual-Socket Intel Xeon E v3 (Haswell) 18 cores 2.9 GHz 60 GB of DRAM
18
Performance Engineering: 4Kx4K Matrix Multiplication
Implementation Time (seconds) Straightforward Python Implementation 25, (> 7 hours)
19
Performance Engineering: 4Kx4K Matrix Multiplication
Implementation Time (seconds) Straightforward Python Implementation 25, (> 7 hours) Straightforward Java Implementation (~ 40 mins)
20
Performance Engineering: 4Kx4K Matrix Multiplication
Implementation Time (seconds) Straightforward Python Implementation 25, (> 7 hours) Straightforward Java Implementation (~ 40 mins) Straightforward C Implementation 542.67 (<10 mins)
21
Performance Engineering: 4Kx4K Matrix Multiplication
Implementation Time (seconds) Straightforward Python Implementation 25, (> 7 hours) Straightforward Java Implementation (~ 40 mins) Straightforward C Implementation 542.67 (<10 mins) Parallel Loops 69.80 (~ 1 min)
22
Performance Engineering: 4Kx4K Matrix Multiplication
Implementation Time (seconds) Straightforward Python Implementation 25, (> 7 hours) Straightforward Java Implementation (~ 40 mins) Straightforward C Implementation 542.67 (<10 mins) Parallel Loops 69.80 (~ 1 min) Parallel Divide and Conquer 3.80
23
Performance Engineering: 4Kx4K Matrix Multiplication
Implementation Time (seconds) Straightforward Python Implementation 25, (> 7 hours) Straightforward Java Implementation (~ 40 mins) Straightforward C Implementation 542.67 (<10 mins) Parallel Loops 69.80 (~ 1 min) Parallel Divide and Conquer 3.80 add Vectorization (streaming the parallel operations) 1.10 add AVX intrinsics (processor commands for vector operations) 0.41 Strassen’s algorithm 0.38
24
The Lesson from this Example
Code “performance engineering” can make incredible improvements in performance. 67,243 times faster in this case! It is very rewarding However, this effort is often meaningless and even hurtful when used in “tuning as you go” style during development Tends to decrease code readability and reusability Often it is hard to identify the real bottleneck early on
25
Remember Code readability/maintainability/etc. is usually more important than efficiency Always start with well-written code, and only tune at the end Measure!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.