Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance* Objective: To learn when and how to optimize the performance of a program. “ The first principle of optimization is don ’ t. ” –Knowing how.

Similar presentations


Presentation on theme: "Performance* Objective: To learn when and how to optimize the performance of a program. “ The first principle of optimization is don ’ t. ” –Knowing how."— Presentation transcript:

1 Performance* Objective: To learn when and how to optimize the performance of a program. “ The first principle of optimization is don ’ t. ” –Knowing how a program will be used and the environment it runs in, is there any benefit to making it faster? *The examples in these slides come from Brian W. Kernighan and Rob Pike, “ The Practice of Programming ”, Addison-Wesley, 1999.

2 Approach The best strategy it to use the simplest, cleanest algorithms and data appropriate for the task. Then measure performance to see if changes are needed. Enable compiler options to generate the fastest possible code.

3 Approach Assess what changes to the program will have the most effect (profile the code). Make changes one at a time and re- assess (always retest to verify correctness). –Consider alternative algorithms –Tune the code –Consider a lower level language (just for time sensitive components)

4 Topics A Bottleneck Timing and Profiling –time and clock –algorithm analysis –prof and gprof –gcov Concentrate on hot spots Strategies for speed Tuning the code

5 A Bottleneck isspam example from the text –Heavily used –Existing implementation not fast enough in current environment Benchmark Profile –Tune code –Change algorithm

6 isspam() / * isspam: test mesg for occurrence of any pat */ int isspam(char *mesg) int i; for (i = 0; i < npat; i++) if (strstr(mesg, pat[i]) != NULL) { printf ("spam: match for '%s'\n", pat [i]) ; return 1; } return 0;

7 strstr() /* simple strstr: use strchr to look for first character */ char strstr(const char *sl, const char *s2) int n; n = strlen(s2); for (;;) { sl = strchr(s1, s2[0]); if (sl == NULL) return NULL; if (strncmp(s1, s2, n) == 0) return (char *) sl; sl++ ; }

8 Inefficiencies strlen() is used to calculate pattern length But patterns are fixed, so calculate once and save strncmp() has complex inner loop –Comparing string bytes –Checking for \0 –Counting down Know string lengths, so don’t check for \0

9 Inefficiencies strchr() also checks for \0 –This is unnecessary Overhead of function calls to strchr(), strlen() and strncmp() adds up Make no function calls in strstr() Making these changes gave 30% speed-up But still too slow!

10 Further Improvements Analyze and improve algorithm for (i = 0; i < npat; i++) if (strstr(mesg, pat[i]) != NULL) return 1; Invert loop for (j = 0; (c = mesg[j]) != ‘\0’; j++) if (some pattern matches starting at mesg[j]) return 1; Don’t need to iterate through all patterns Patterns stored in table

11 Timing In Unix environment –time command –writes the total time elapsed, the time consumed by system overhead, and the compute time used to execute command Example (time quicksort from chapter 2) –head –10000 in.txt –gcc –o sort1 sort1.c quicksort.c –time sort1 /dev/null

12 Algorithm Analysis Consider the asymptotic analysis of your program and the algorithms you are using For quicksort, let T(n) be the runtime as a function of the size of the input array (the time will depend on the particular input array!) The expected runtime is  (nlog(n)) –If each partition roughly splits the array in half then the computing time T(n)  2T(n/2) + cn The worst case is  (n 2 ) –If each partition splits the array into two pieces of unequal size (in the extreme 1 and n-1) –T(n) = T(n-1) + cn =  (n 2 )

13 Worst Case for Quicksort Modify the code to remove the random selection of the pivot –This makes it possible to deterministically construct a worst case input (this is why randomization was used) –The worst case will occur for sorted or reverse sorted input –For sorted input, the number of comparisons Q(n) as a function of input size satisfies –Q(n) = Q(n-1) + n-1, Q(1) = 0 –Q(n) = n(n-1)/2

14 What does Asymptotic Analysis mean for Actual Runtimes If T(n) =  (n 2 ) –Doubling the input size increases the time by a factor of 4 –T(2n)/T(n) = (c4n 2 + o(n 2 ))/(cn 2 + o(n 2) ), which in the limit is equal to 4. o(n 2 ) means lower order terms. If T(n) =  (nlog(n)) –Doubling the input size roughly doubles the time [same as linear] –T(2n)/T(n) = (c2nlog(2n) + o(nlogn))/(nlog(n)+o(nlogn)) = = (c2nlogn + o(nlogn))/(cnlogn + o(nlogn)), which in the limit is equal to 2

15 Empirical Confirmation Run and time quicksort (without random pivot) on sorted inputs of size 10,000 and 20,000, and 40,000 Compute the ratio of times to see if it is a factor of 4. What if random inputs are used?

16 Growth Rates and Limits Suppose T(n) =  (f(n)) [grows at same rate] –limit n  T(n)/f(n) = c, a constant > 0. –[Actually this is not true, there may be separate limsup and liminf, but as a first approximation you can view it is true. Suppose T(n) = o(f(n)) [grows slower] –limit n  T(n)/f(n) = 0 Suppose T(n) =  (f(n)) [grows faster] –limit n  T(n)/f(n) = 

17 Determining Growth Rate Empirically Time quicksort with a range of input sizes –e.g. 1000, 2000, 3000, …, 10000 Write a program that times sort for a range of inputs. Use the clock function to time code inside a program. –T(1000), T(2000), T(3000),…,T(10000) –plot times for range of input to visualize –Compute ratios to compare to known functions –T(1000)/1000 2, T(2000)/2000 2,…, T(10000)/10000 2 –Does the ratio approach a constant, go to 0, go to  ? –I.e. is is growing at the same rate, faster, or slower than the comparison function?

18 Obtaining Range of Times sortr 1000 10 1000 –sorts and times sorted arrays of size –1000, 2000, 3000,…,10000

19 Profiling with gprof Reports on time spent in different functions (also gives number of times functions called) –Shows the hotspots gcc –pg sort1.c quicksort.c –o sort1 sort1 /dev/null gprof sort1 gmon.out

20 Profiling with gcov Uses source code analysis provided by the compiler to analyze the number of times each statement in the source code is executed. –$gcc -fprofile-arcs -ftest-coverage sorti.c quicksorti.c -o sorti –$sorti 10 –$gcov sorti.c –$gcov quicksorti.c

21 Strategies for Speed Concentrate on hot spots –Pay attention to which functions take the most time and how much time they take Plot performance data –Highlights effects of parameter changes, comparisons of algorithms and data structures –Identifies unexpected behaviors

22 Strategies for Speed Use better algorithms and data structures –Be aware of space and time complexity of algorithms and data structures Enable compiler optimizations –Not during code development, slows compilation –Check that code is still correct Tune the code –Adjust details of loops and expressions –Check that code is still correct

23 Strategies for Speed Make sure each change continues to make program faster –Interaction between changes could slow code Don’t optimize what doesn’t matter –Be sure you work on sections of code that take the most time How much effort to make code faster? –The programmer time spent making a program faster should be less than the time the speed-up will recover in the lifetime of the program.

24 Tuning the Code Collect common subexpressions –Compute them only once Replace expensive operations with cheaper ones –x*x*x vs. pow(x,3) –Don’t use sqrt during distance calculations if not necessary Unroll or eliminate loops

25 Tuning the Code Cache frequently-used values –Compute them only once Write a special-purpose memory allocator Buffer input and output Precompute results, e.g. strlen(pat) Use approximate values, double vs. int Rewrite in a lower-level language

26 Summary Choose the right algorithm Get code working correctly, then optimize Measure, i.e. time and profile Focus on a few places that will make the most difference Verify correctness Measure again (Rinse and repeat) Stop optimizing as soon as possible


Download ppt "Performance* Objective: To learn when and how to optimize the performance of a program. “ The first principle of optimization is don ’ t. ” –Knowing how."

Similar presentations


Ads by Google