Performance Improvement

Slides:



Advertisements
Similar presentations
Code Tuning Strategies and Techniques
Advertisements

C Language.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
GNU gprof Profiler Yu Kai Hong Department of Mathematics National Taiwan University July 19, 2008 GNU gprof 1/22.
ADA: 5. Quicksort1 Objective o describe the quicksort algorithm, it's partition function, and analyse its running time under different data conditions.
CMPT 225 Sorting Algorithms Algorithm Analysis: Big O Notation.
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
CS 162 Intro to Programming II Quick Sort 1. Quicksort Maybe the most commonly used algorithm Quicksort is also a divide and conquer algorithm Advantage.
1 Sorting Algorithms (Part II) Overview  Divide and Conquer Sorting Methods.  Merge Sort and its Implementation.  Brief Analysis of Merge Sort.  Quick.
1 Lecture 6 Performance Measurement and Improvement.
Quicksort. 2 Introduction * Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case: O(N 2 ) n But, the worst case seldom.
Quicksort.
Quicksort.
Chapter 4 Assessing and Understanding Performance
CSc 352 Performance Tuning Saumya Debray Dept. of Computer Science The University of Arizona, Tucson
Sorting II/ Slide 1 Lecture 24 May 15, 2011 l merge-sorting l quick-sorting.
Performance Measuring on Blue Horizon and Sun HPC Systems: Timing, Profiling, and Reading Assembly Language NPACI Parallel Computing Institute 2000 Sean.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
Simple Sorting Algorithms. 2 Bubble sort Compare each element (except the last one) with its neighbor to the right If they are out of order, swap them.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Recursion, Complexity, and Searching and Sorting By Andrew Zeng.
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
1 ENERGY 211 / CME 211 Lecture 13 October 20, 2008.
Recursion, Complexity, and Sorting By Andrew Zeng.
Timing and Profiling ECE 454 Computer Systems Programming Topics: Measuring and Profiling Cristiana Amza.
CS 61B Data Structures and Programming Methodology July 28, 2008 David Sun.
Application Profiling Using gprof. What is profiling? Allows you to learn:  where your program is spending its time  what functions called what other.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
CSC 211 Data Structures Lecture 13
EFFICIENCY & SORTING II CITS Scope of this lecture Quicksort and mergesort Performance comparison.
Timing Programs and Performance Analysis Tools for Analysing and Optimising advanced Simulations.
CS 3500 L Performance l Code Complete 2 – Chapters 25/26 and Chapter 7 of K&P l Compare today to 44 years ago – The Burroughs B1700 – circa 1974.
1 Data Structures. 2 Motivating Quotation “Every program depends on algorithms and data structures, but few programs depend on the invention of brand.
Adv. UNIX: Profile/151 Advanced UNIX v Objectives –introduce profiling based on execution times and line counts Special Topics in Comp.
1 Announcements  Homework 4 out today  Dec 7 th is the last day you can turn in Lab 4 and HW4, so plan ahead.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
Performance Improvement
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 11 – gdb and Debugging.
1 Performance Issues CIS*2450 Advanced Programming Concepts.
The Sorting Methods Lecture Notes 10. Sorts Many programs will execute more efficiently if the data they process is sorted before processing begins. –
Sorting divide and conquer. Divide-and-conquer  a recursive design technique  solve small problem directly  divide large problem into two subproblems,
Copyright 2014 – Noah Mendelsohn Performance Analysis Tools Noah Mendelsohn Tufts University Web:
PREVIOUS SORTING ALGORITHMS  BUBBLE SORT –Time Complexity: O(n 2 ) For each item, make (n –1) comparisons Gives: Comparisons = (n –1) + (n – 2)
Week 13 - Wednesday.  What did we talk about last time?  NP-completeness.
Performance* Objective: To learn when and how to optimize the performance of a program. “ The first principle of optimization is don ’ t. ” –Knowing how.
QuickSort. Yet another sorting algorithm! Usually faster than other algorithms on average, although worst-case is O(n 2 ) Divide-and-conquer: –Divide:
The Linear and Binary Search and more Lecture Notes 9.
329 3/30/98 CSE 143 Searching and Sorting [Sections 12.4, ]
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Optimization. How to Optimize Code Conventional Wisdom: 1.Don't do it 2.(For experts only) Don't do it yet.
July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.
Recursion Riley Chapter 14 Sections 14.1 – 14.5 (14.6 optional)
Two notions of performance
Profiling with GNU GProf
Precept 10 : Building and Performance
Simple Sorting Algorithms
Chapter 5 Conclusion CIS 61.
Chapter 7 Sorting Spring 14
Applied Algorithms (Lecture 17) Recursion Fall-23
CO 303 Algorithm Analysis And Design Quicksort
slides adapted from Marty Stepp
CSE 373 Data Structures and Algorithms
Min Heap Update E.g. remove smallest item 1. Pop off top (smallest) 3
Algorithms: Design and Analysis
Data Structures & Algorithms
Performance Tuning.
CSc 352 Performance Tuning
Presentation transcript:

Performance Improvement The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 7

Goals of this Lecture Help you learn about: Why? Techniques for improving program performance: how to make your programs run faster and/or use less memory The GPROF execution profiler Why? In a large program, typically a small fragment of the code consumes most of the CPU time and/or memory A power programmer knows how to identify such code fragments A power programmer knows techniques for improving the performance of such code fragments

Performance Improvement Pros Techniques described in this lecture can yield answers to questions such as: How slow is my program? Where is my program slow? Why is my program slow? How can I make my program run faster? How can I make my program use less memory?

Performance Improvement Cons Techniques described in this lecture can yield code that: Is less clear/maintainable Might confuse debuggers Might contain bugs Requires regression testing So…

When to Improve Performance “The first principle of optimization is don’t. Is the program good enough already? Knowing how a program will be used and the environment it runs in, is there any benefit to making it faster?” -- Kernighan & Pike

Improving Execution Efficiency Steps to improve execution (time) efficiency: (1) Do timing studies (2) Identify hot spots (3) Use a better algorithm or data structure (4) Enable compiler speed optimization (5) Tune the code Let’s consider one at a time…

Timing Studies (1) Do timing studies To time a program… Run a tool to time program execution E.g., UNIX time command Output: Real: Wall-clock time between program invocation and termination User: CPU time spent executing the program System: CPU time spent within the OS on the program’s behalf But, which parts of the code are the most time consuming? $ time sort < bigfile.txt > output.txt real 0m12.977s user 0m12.860s sys 0m0.010s

Timing Studies (cont.) To time parts of a program... Call a function to compute wall-clock time consumed E.g., UNIX gettimeofday() function (time since Jan 1, 1970) Not defined by C90 standard #include <sys/time.h> struct timeval startTime; struct timeval endTime; double wallClockSecondsConsumed; gettimeofday(&startTime, NULL); <execute some code here> gettimeofday(&endTime, NULL); wallClockSecondsConsumed = endTime.tv_sec - startTime.tv_sec + 1.0E-6 * (endTime.tv_usec - startTime.tv_usec); UNIX time is relative to 00:00:00 UTC on January 1, 1970 UTC is Coordinated Universal Time (i.e., GMT)

Timing Studies (cont.) To time parts of a program... Call a function to compute CPU time consumed E.g. clock() function Defined by C90 standard #include <time.h> clock_t startClock; clock_t endClock; double cpuSecondsConsumed; startClock = clock(); <execute some code here> endClock = clock(); cpuSecondsConsumed = ((double)(endClock - startClock)) / CLOCKS_PER_SEC;

Identify Hot Spots (2) Identify hot spots Gather statistics about your program’s execution How much time did execution of a function take? How many times was a particular function called? How many times was a particular line of code executed? Which lines of code used the most time? Etc. How? Use an execution profiler Example: gprof (GNU Performance Profiler)

GPROF Example Program Example program for GPROF analysis Sort an array of 10 million random integers Artificial: consumes much CPU time, generates no output #include <string.h> #include <stdio.h> #include <stdlib.h> enum {MAX_SIZE = 10000000}; int a[MAX_SIZE]; /* Too big to fit in stack! */ void fillArray(int a[], int size) { int i; for (i = 0; i < size; i++) a[i] = rand(); } void swap(int a[], int i, int j) { int temp = a[i]; a[i] = a[j]; a[j] = temp; …

GPROF Example Program (cont.) Example program for GPROF analysis (cont.) … int partition(int a[], int left, int right) { int first = left-1; int last = right; for (;;) { while (a[++first] < a[right]) ; while (a[right] < a[--last]) if (last == left) break; if (first >= last) swap(a, first, last); } swap(a, first, right); return first;

GPROF Example Program (cont.) Example program for GPROF analysis (cont.) … void quicksort(int a[], int left, int right) { if (right > left) { int mid = partition(a, left, right); quicksort(a, left, mid - 1); quicksort(a, mid + 1, right); } int main(void) { fillArray(a, MAX_SIZE); quicksort(a, 0, MAX_SIZE - 1); return 0;

Using GPROF Step 1: Instrument the program gcc217 –pg mysort.c –o mysort Adds profiling code to mysort, that is… “Instruments” mysort Step 2: Run the program mysort Creates file gmon.out containing statistics Step 3: Create a report gprof mysort > myreport Uses mysort and gmon.out to create textual report Step 4: Examine the report cat myreport

The GPROF Report Flat profile Each line describes one function name: name of the function %time: percentage of time spent executing this function cumulative seconds: [skipping, as this isn’t all that useful] self seconds: time spent executing this function calls: number of times function was called (excluding recursive) self s/call: average time per execution (excluding descendents) total s/call: average time per execution (including descendents) % cumulative self self total time seconds seconds calls s/call s/call name 84.54 2.27 2.27 6665307 0.00 0.00 partition 9.33 2.53 0.25 54328749 0.00 0.00 swap 2.99 2.61 0.08 1 0.08 2.61 quicksort 2.61 2.68 0.07 1 0.07 0.07 fillArray

The GPROF Report (cont.) Call graph profile index % time self children called name <spontaneous> [1] 100.0 0.00 2.68 main [1] 0.08 2.53 1/1 quicksort [2] 0.07 0.00 1/1 fillArray [5] ----------------------------------------------- 13330614 quicksort [2] 0.08 2.53 1/1 main [1] [2] 97.4 0.08 2.53 1+13330614 quicksort [2] 2.27 0.25 6665307/6665307 partition [3] 2.27 0.25 6665307/6665307 quicksort [2] [3] 94.4 2.27 0.25 6665307 partition [3] 0.25 0.00 54328749/54328749 swap [4] 0.25 0.00 54328749/54328749 partition [3] [4] 9.4 0.25 0.00 54328749 swap [4] 0.07 0.00 1/1 main [1] [5] 2.6 0.07 0.00 1 fillArray [5]

The GPROF Report (cont.) Call graph profile (cont.) Each section describes one function Which functions called it, and how much time was consumed? Which functions it calls, how many times, and for how long? Usually overkill; we won’t look at this output in any detail

GPROF Report Analysis Observations Conclusions swap() is called very many times; each call consumes little time; swap() consumes only 9% of the time overall partition() is called many times; each call consumes little time; but partition() consumes 85% of the time overall Conclusions To improve performance, try to make partition() faster Don’t even think about trying to make fillArray() or quicksort() faster

GPROF Design Incidentally… How does GPROF work? Good question! Essentially, by randomly sampling the code as it runs … and seeing what line is running, & what function it’s in

Algorithms and Data Structures (3) Use a better algorithm or data structure Example: For mysort, would mergesort work better than quicksort? Depends upon: Data Hardware Operating system …

Compiler Speed Optimization (4) Enable compiler speed optimization gcc217 –Ox mysort.c –o mysort Compiler spends more time compiling your code so… Your code spends less time executing x can be: 1: optimize 2: optimize more 3: optimize yet more See “man gcc” for details Beware: Speed optimization can affect debugging E.g. Optimization eliminates variable => GDB cannot print value of variable

Tune the Code (5) Tune the code Some common techniques Factor computation out of loops Example: Faster: for (i = 0; i < strlen(s); i++) { /* Do something with s[i] */ } length = strlen(s); for (i = 0; i < length; i++) { /* Do something with s[i] */ }

Tune the Code (cont.) Some common techniques (cont.) Inline function calls Example: Maybe faster: Beware: Can introduce redundant/cloned code Some compilers support “inline” keyword directive void g(void) { /* Some code */ } void f(void) { … g(); void f(void) { … /* Some code */ }

Tune the Code (cont.) Some common techniques (cont.) Unroll loops – some compilers have flags for it, like –funroll-loops Example: Maybe faster: Maybe even faster: for (i = 0; i < 6; i++) a[i] = b[i] + c[i]; for (i = 0; i < 6; i += 2) { a[i+0] = b[i+0] + c[i+0]; a[i+1] = b[i+1] + c[i+1]; } a[i+0] = b[i+0] + c[i+0]; a[i+1] = b[i+1] + c[i+1]; a[i+2] = b[i+2] + c[i+2]; a[i+3] = b[i+3] + c[i+3]; a[i+4] = b[i+4] + c[i+4]; a[i+5] = b[i+5] + c[i+5];

Tune the Code (cont.) Some common techniques (cont.): Rewrite in a lower-level language Write key functions in assembly language instead of C As described in 2nd half of course Beware: Modern optimizing compilers generate fast code Hand-written assembly language code could be slower than compiler-generated code, especially when compiled with speed optimization

Improving Memory Efficiency These days, memory is cheap, so… Memory (space) efficiency typically is less important than execution (time) efficiency Techniques to improve memory (space) efficiency…

Improving Memory Efficiency (1) Use a smaller data type E.g. short instead of int (2) Compute instead of storing E.g. To determine linked list length, traverse nodes instead of storing node count (3) Enable compiler size optimization gcc217 -Os mysort.c –o mysort

Summary Steps to improve execution (time) efficiency: (1) Do timing studies (2) Identify hot spots * (3) Use a better algorithm or data structure (4) Enable compiler speed optimization (5) Tune the code * Use GPROF Techniques to improve memory (space) efficiency: (1) Use a smaller data type (2) Compute instead of storing (3) Enable compiler size optimization And, most importantly…

Don’t improve performance unless you must!!! Summary (cont.) Clarity supersedes performance Don’t improve performance unless you must!!!