240-491 Adv. UNIX: Profile/151 Advanced UNIX v Objectives –introduce profiling based on execution times and line counts 240-491 Special Topics in Comp.

Slides:



Advertisements
Similar presentations
C Language.
Advertisements

Memory Protection: Kernel and User Address Spaces  Background  Address binding  How memory protection is achieved.
Computer Programming w/ Eng. Applications
Performance of Cache Memory
Computer Science 2212a/b - UWO1 Structural Testing Motivation The gcov Tool An example using gcov How does gcov do it gcov subtleties Further structural.
GNU gprof Profiler Yu Kai Hong Department of Mathematics National Taiwan University July 19, 2008 GNU gprof 1/22.
Coverage analysis using gcc (or g++) source code source code executable run results coverage gcc –ftest-coverage –fprofile-arcs myprogram.exe myprogram.cpp.
SE-292 High Performance Computing Profiling and Performance R. Govindarajan
1 Lecture 6 Performance Measurement and Improvement.
1 CSE1301 Computer Programming: Lecture 15 Flowcharts and Debugging.
Loops – While, Do, For Repetition Statements Introduction to Arrays
1 Lab Session-IV CSIT-120 Spring 2001 Lab 3 Revision and Exercises Rev: Precedence Rules Lab Exercise 4-A Machine Language Programming The “Micro” Machine.
Performance Improvement
1 CSE1301 Computer Programming: Lecture 15 Flowcharts, Testing and Debugging.
Guide To UNIX Using Linux Third Edition
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
1 SEEM3460 Tutorial Unix Introduction. 2 Introduction What is Unix? An operation system (OS), similar to Windows, MacOS X Why learn Unix? Greatest Software.
Spring 2014 SILICON VALLEY UNIVERSITY CONFIDENTIAL 1 Introduction to Embedded Systems Dr. Jerry Shiao, Silicon Valley University.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
GNU gcov (1/4) [from Wikipedia] gcov is a source code coverage analysis and statement- by-statement profiling tool. gcov generates exact counts of the.
Timing and Profiling ECE 454 Computer Systems Programming Topics: Measuring and Profiling Cristiana Amza.
Adv. UNIX: large/131 Advanced UNIX v Objectives of these slides: –learn how to write/manage large programs consisting of multiple files, which.
Summary of what we learned yesterday Basics of C++ Format of a program Syntax of literals, keywords, symbols, variables Simple data types and arithmetic.
Unit Testing 101 Black Box v. White Box. Definition of V&V Verification - is the product correct Validation - is it the correct product.
Application Profiling Using gprof. What is profiling? Allows you to learn:  where your program is spending its time  what functions called what other.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
Guide To UNIX Using Linux Third Edition Chapter 8: Exploring the UNIX/Linux Utilities.
Timing Programs and Performance Analysis Tools for Analysing and Optimising advanced Simulations.
CS161 Topic #16 1 Today in CS161 Lecture #16 Prepare for the Final Reviewing all Topics this term Variables If Statements Loops (do while, while, for)
1 SEEM3460 Tutorial Compiling and Debugging C programs.
1 Announcements  Homework 4 out today  Dec 7 th is the last day you can turn in Lab 4 and HW4, so plan ahead.
C/C++ Basics. Basic Concepts Basic functions of each language: Input, output, math, decision, repetition Types of errors: Syntax errors, logic errors,
(a) What is the output generated by this program? In fact the output is not uniquely defined, i.e., it is not necessarily the same in each execution. What.
Lecture 2a: Performance Measurement. Goals of Performance Analysis The goal of performance analysis is to provide quantitative information about the performance.
Just a Little PHP Programming PHP on the Server. Common Programming Language Features Comments Data Types Variable Declarations Expressions Flow of Control.
Structuring Data: Arrays ANSI-C. Representing multiple homogenous data Problem: Input: Desired output:
1 Performance Issues CIS*2450 Advanced Programming Concepts.
Adv. UNIX:pre/111 Advanced UNIX v Objectives of these slides: –look at the features of the C preprocessor Special Topics in Comp. Eng.
Copyright 2014 – Noah Mendelsohn Performance Analysis Tools Noah Mendelsohn Tufts University Web:
Announcements Assignment 1 due Wednesday at 11:59PM Quiz 1 on Thursday 1.
Performance* Objective: To learn when and how to optimize the performance of a program. “ The first principle of optimization is don ’ t. ” –Knowing how.
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,
Week 4 - Monday.  What did we talk about last time?  Precedence  Selection statements  Loops  Lab 3.
C P ROGRAMMING T OOLS. C OMPILING AND R UNNING S INGLE M ODULE P ROGRAM.
CMSC 104, Version 8/061L14AssignmentOps.ppt Assignment Operators Topics Increment and Decrement Operators Assignment Operators Debugging Tips Reading Section.
Announcements You will receive your scores back for Assignment 2 this week. You will have an opportunity to correct your code and resubmit it for partial.
CS1010: Programming Methodology
Two-week ISTE workshop on Effective teaching/learning of computer programming Dr Deepak B Phatak Subrao Nilekani Chair Professor Department of CSE, Kanwal.
CS Class 04 Topics  Selection statement – IF  Expressions  More practice writing simple C++ programs Announcements  Read pages for next.
1 CSE1301 Computer Programming: Lecture 16 Flow Diagrams and Debugging.
CMSC 104, Section 301, Fall Lecture 18, 11/11/02 Functions, Part 1 of 3 Topics Using Predefined Functions Programmer-Defined Functions Using Input.
© Dr. A. Williams, Fall Present Software Quality Assurance – Clover Lab 1 Tutorial / lab 2: Code instrumentation Goals of this session: 1.Create.
PHP Tutorial. What is PHP PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
L071 Introduction to C Topics Compilation Using the gcc Compiler The Anatomy of a C Program Reading Sections
OPERATORS IN C CHAPTER 3. Expressions can be built up from literals, variables and operators. The operators define how the variables and literals in the.
L131 Assignment Operators Topics Increment and Decrement Operators Assignment Operators Debugging Tips rand( ) math library functions Reading Sections.
July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.
Profiling with GNU GProf
A bit of C programming Lecture 3 Uli Raich.
Loops BIS1523 – Lecture 10.
Computer Organization & Assembly Language Chapter 3
Representation of data types
Week 4 - Monday CS222.
C/C++ Basics.
Assignment Operators Topics Increment and Decrement Operators
Assignment Operators Topics Increment and Decrement Operators
Computer Programming Techniques Semester 1, 1998
Summary of what we learned yesterday
Assignment Operators Topics Increment and Decrement Operators
Presentation transcript:

Adv. UNIX: Profile/151 Advanced UNIX v Objectives –introduce profiling based on execution times and line counts Special Topics in Comp. Eng. 2 Semester 2, Profiling

Adv. UNIX: Profile/152 Contents 1.What is Profiling? 2.Profiling Primes Programs 3.Primes v.1 (p1.c) 4.Primes v.2 (p2.c) 5.Primes v.3 (p3.c) 6.Primes v.4 (p4.c) continued

Adv. UNIX: Profile/153 7.Primes v.5 (p5.c) 8.Care with Timings 9.Quick Timings 10.Function Call Trees

Adv. UNIX: Profile/ What is Profiling? v Profiling a program involves collecting numerical data about its execution –e.g. the total running time, the running time for each function, the number of times a function/statement is executed v This information can be used for speed optimisation, and for debugging.

Adv. UNIX: Profile/ Profiling Primes Programs v We will profile five versions of a primes program –it prints all the prime numbers between 2 and 70,000 v Two types of profiling are carried out: –time profiling of the functions –counting the number of times statements are executed

Adv. UNIX: Profile/ Time Profiling  Detailed timing information for a program (e.g. foo.c ) is obtained in three steps: –$ gcc -pg -o foo foo.c –$ foo –$ gprof -b foo -b switches off the explanation of the results format continued

Adv. UNIX: Profile/157  gprof reports: –execution times for each function –info. on how functions call each other u this includes the number of times a function has been called

Adv. UNIX: Profile/158 gprof Information  man gprof v The Web page: "GNU gprof" – info/gprof.html –explains some of the undocumented features of gprof, and gives examples of its use

Adv. UNIX: Profile/ Line Counting Profiling v Count the number of times lines in the program have been executed: –$ gcc -fprofile-arcs -ftest-coverage -o foo foo.c –$ foo –$ gcov foo continued

Adv. UNIX: Profile/1510  gcov generates a modified source listing of foo.c stored in foo.c.gcov –the listing includes execution counts for the lines  Multiple calls to gcov foo, will cause the line executions to be counted again, and for foo.c.gcov to be updated.

Adv. UNIX: Profile/1511 gcov Information  man gcov –very brief v The Web page: "gcov: a Test Coverage Program" – gcov_1.html

Adv. UNIX: Profile/ Primes v.1 (p1.c)  This program calculates the primes between 2 and 70,000 ( MAXPRIME ) by calling prime() for each integer.  The primes are printed in rows, 9 ( NUMCOLS ) primes per row.

Adv. UNIX: Profile/ p1.c v v #include #define NUMCOLS 9 #define MAXPRIME int prime (int n); int main () { int i; int colCount = 0; :

Adv. UNIX: Profile/1514 for (i = 2; i <= MAXPRIME; i++) if (prime (i)) { colCount++; if (colCount%NUMCOLS == 0) { printf ("%5d\n", i); colCount = 0; } else printf ("%5d ", i); } putchar('\n'); return 0; } continued

Adv. UNIX: Profile/1515 int prime (int n) /* Is n a prime? Return 0 if yes, 1 otherwise */ { int i; for (i = 2; i < n; i++) if (n % i == 0) return 0; return 1; }

Adv. UNIX: Profile/ p1.c Timings v v $ gcc -pg -o p1 p1.c $ p $ gprof -b p1 Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name prime : continued total running time

Adv. UNIX: Profile/1517 v v Call graph granularity: each sample hit covers 4 byte(s) for 0.04% of seconds index % time self children called name /69999 main [2] [1] prime[1] [2] main [2] /69999 prime[1] Index by function name [1] prime

Adv. UNIX: Profile/ p1.c Line Counts v v $ gcc -fprofile-arcs -ftest-coverage -o p1 p1.c $ p1 > /dev/null $ gcov p % of 20 source lines executed in file p1.c Creating p1.c.gcov. continued

Adv. UNIX: Profile/1519 v v $ cat p1.c.gcov #include #define NUMCOLS 9 #define MAXPRIME int prime (int n); int main () 1 { int i; 1 int colCount = 0; :

Adv. UNIX: Profile/ for (i = 2; i <= MAXPRIME; i++) if (prime (i)) { 6935 colCount++; 6935 if (colCount%NUMCOLS == 0) { 770 printf ("%5d\n", i); 770 colCount = 0; 770 } else 6165 printf ("%5d ", i); } 1 putchar('\n'); 1 return 0; 1 } continued number of primes

Adv. UNIX: Profile/1521 int prime (int n) { int i; for (i = 2; i < n; i++) if (n % i == 0) return 0; 6935 return 1; } no. of non-primes (63064) + no. of primes (6935) = total range (69999). The expensive operations in prime() are the loop and factor test

Adv. UNIX: Profile/ Primes v.2 (p2.c)  The analysis of p1.c shows it's "hot spots" are the loop and if-test inside prime(). –speeding these up would be good v Mathematical theory says that if n has a factor, then it will occur between 2 and n 0.5 –we can use this to reduce the loop range to 2..root(n)

Adv. UNIX: Profile/ p2.c v v #include #include #define NUMCOLS 9 #define MAXPRIME int prime (int n); int root(int n); int main () { int i; int colCount = 0; :

Adv. UNIX: Profile/1524 for (i = 2; i <= MAXPRIME; i++) if (prime (i)) { colCount++; if (colCount%NUMCOLS == 0) { printf ("%5d\n", i); colCount = 0; } else printf ("%5d ", i); } putchar('\n'); return 0; } continued

Adv. UNIX: Profile/1525 int prime (int n) { int i; for (i = 2; i <= root(n); i++) if (n % i == 0) return 0; return 1; } int root(int n) { return (int) sqrt( (float)n ); }

Adv. UNIX: Profile/ p2.c Timings v v $ gcc -pg -o p2 p2.c -lm $ p $ gprof -b p2 Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name root prime : total running time continued

Adv. UNIX: Profile/1527 v v Call graph granularity: each sample hit covers 4 byte(s) for 1.79% of 0.56 seconds index % time self children called name /69999 main [2] [1] prime[1] / root [3] [2] main [2] /69999 prime[1] / prime[1] [3] root [3] Index by function name [1] prime [3] root

Adv. UNIX: Profile/1528 Some Observations  p2.c is very much faster than p1.c : –0.56 secs compared to secs (~50 times)  The reason is that prime() in p2.c takes 0.28 secs compared to secs in p1.c –to see why, it helps to know the line counts inside prime()

Adv. UNIX: Profile/ p2.c Line Counts v v $ gcc -fprofile-arcs -ftest-coverage -o p2 p2.c -lm $ p2 > /dev/null $ gcov p % of 22 source lines executed in file p2.c Creating p2.c.gcov. continued

Adv. UNIX: Profile/1530 v v $ cat p2.c.gcov #include #include #define NUMCOLS 9 #define MAXPRIME int prime (int n); int root(int n); int main () 1 { int i; 1 int colCount = 0; :

Adv. UNIX: Profile/ for (i = 2; i <= MAXPRIME; i++) if (prime (i)) { 6935 colCount++; 6935 if (colCount%NUMCOLS == 0) { 770 printf ("%5d\n", i); 770 colCount = 0; 770 } else 6165 printf ("%5d ", i); } 1 putchar('\n'); 1 return 0; 1 } same number of primes as in p1.c continued

Adv. UNIX: Profile/1532 int prime (int n) { int i; for (i = 2; i <= root(n); i++) if (n % i == 0) return 0; 6935 return 1; } int root(int n) { return (int) sqrt( (float)n ); } same numbers of non-primes and primes as in p1.c

Adv. UNIX: Profile/1533 Some Observations  The loop and if-test in p2.c 's prime() are executed many times less than in p1.c –p2.c loop: 1,682,490 if-test: 1,675,555 –p1.c loop: 229,394,196 if-test: 229,387,261  The calls to root() are very expensive: –50% of the total execution time (0.28 secs) –1,682,490 calls to sqrt() inside root()

Adv. UNIX: Profile/ Primes v.3 (p3.c)  This version is very similar to p2.c, but the call to root() inside prime() has been moved outside of the loop.

Adv. UNIX: Profile/ p3.c v v #include #include #define NUMCOLS 9 #define MAXPRIME int prime (int n); int root(int n); int main () { int i; int colCount = 0; :

Adv. UNIX: Profile/1536 for (i = 2; i <= MAXPRIME; i++) if (prime (i)) { colCount++; if (colCount%NUMCOLS == 0) { printf ("%5d\n", i); colCount = 0; } else printf ("%5d ", i); } putchar('\n'); return 0; } continued

Adv. UNIX: Profile/1537 int prime (int n) { int i, bound; bound = root(n); for (i = 2; i <= bound; i++) if (n % i == 0) return 0; return 1; } int root(int n) { return (int) sqrt( (float)n ); }

Adv. UNIX: Profile/ p3.c Timings v v $ gcc -pg -o p3 p3.c -lm $ p $ gprof -b p3 Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name prime root continued

Adv. UNIX: Profile/1539 v v Call graph granularity: each sample hit covers 4 byte(s) for 6.67% of 0.15 seconds index % time self children called name /69999 main [2] [1] prime[1] /69999 root [3] [2] main [2] /69999 prime[1] /69999 prime[1] [3] root [3] Index by function name [1] prime [3] root

Adv. UNIX: Profile/1540 Some Observations  p3.c is faster than p2.c : –0.15 secs compared to 0.56 secs (~ 4 times)  The speed-up is due to root() : –p3.c root() time: 0.15 secs, no. calls: 69,999 –p2.c root() time: 0.28 secs, no. calls: 1,682,490

Adv. UNIX: Profile/ p3.c Line Counts v v $ gcc -fprofile-arcs -ftest-coverage -o p3 p3.c -lm $ p3 > /dev/null $ gcov p % of 23 source lines executed in file p3.c Creating p3.c.gcov. continued

Adv. UNIX: Profile/1542 v v $ cat p3.c.gcov #include #include #define NUMCOLS 9 #define MAXPRIME int prime (int n); int root(int n); int main () 1 { int i; 1 int colCount = 0; :

Adv. UNIX: Profile/ for (i = 2; i <= MAXPRIME; i++) if (prime (i)) { 6935 colCount++; 6935 if (colCount%NUMCOLS == 0) { 770 printf ("%5d\n", i); 770 colCount = 0; 770 } else 6165 printf ("%5d ", i); } 1 putchar('\n'); 1 return 0; 1 } same number of primes as before continued

Adv. UNIX: Profile/1544 int prime (int n) { int i, bound; bound = root(n); for (i = 2; i <= bound; i++) if (n % i == 0) return 0; 6935 return 1; } int root(int n) { return (int) sqrt( (float)n ); } same numbers of non-primes and primes as before sqrt() called much less

Adv. UNIX: Profile/ Primes v.4 (p4.c)  This version makes two modifications to the code in p3.c : –add divisibility tests for 2, 3, 5 into prime(), to filter out numbers before the expensive loop u in the process we add some bugs! –have the for-loop start at 7, and increment in steps of 2

Adv. UNIX: Profile/ p4.c v v #include #include #define NUMCOLS 9 #define MAXPRIME int prime (int n); int root(int n); int main () { int i; int colCount = 0; :

Adv. UNIX: Profile/1547 for (i = 2; i <= MAXPRIME; i++) if (prime (i)) { colCount++; if (colCount%NUMCOLS == 0) { printf ("%5d\n", i); colCount = 0; } else printf ("%5d ", i); } putchar('\n'); return 0; } continued

Adv. UNIX: Profile/1548 int prime (int n) { int i, bound; if (n%2 == 0) return 0; if (n%3 == 0) return 0; if (n%5 == 0) return 0; bound = root(n); for (i = 7; i <= bound; i = i+2) if (n % i == 0) return 0; return 1; } int root(int n) { return (int) sqrt( (float)n ); }

Adv. UNIX: Profile/ p4.c Timings v v $ gcc -pg -o p4 p4.c -lm $ p $ gprof -b p4 Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name prime main root where are 2, 3, and 5? continued

Adv. UNIX: Profile/1550 v v Call graph granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds index % time self children called name [1] main [1] /69999 prime [2] /69999 main [1] [2] prime [2] /18665 root [3] /18665 prime [2] [3] root [3] Index by function name [1] main [2] prime [3] root

Adv. UNIX: Profile/1551 Some Observations  p4.c is a tiny bit faster than p3.c : –0.13 secs compared to 0.15 secs v There is an error somewhere, since primes 2, 3, and 5 were not printed.  root() is called a lot less times: –p4.c root() calls: –p3.c root() calls: –but both execution times are close to 0 secs

Adv. UNIX: Profile/ p4.c Line Counts v v $ gcc -fprofile-arcs -ftest-coverage -o p4 p4.c -lm $ p4 > /dev/null $ gcov p % of 29 source lines executed in file p4.c Creating p4.c.gcov. continued

Adv. UNIX: Profile/1553 v v $ cat p4.c.gcov #include #include #define NUMCOLS 9 #define MAXPRIME int prime (int n); int root(int n); int main () 1 { int i; 1 int colCount = 0; :

Adv. UNIX: Profile/ for (i = 2; i <= MAXPRIME; i++) if (prime (i)) { 6932 colCount++; 6932 if (colCount%NUMCOLS == 0) { 770 printf ("%5d\n", i); 770 colCount = 0; 770 } else 6162 printf ("%5d ", i); } 1 putchar('\n'); 1 return 0; 1 } not the same number of primes as before (6935) continued

Adv. UNIX: Profile/1555 int prime (int n) { int i, bound; if (n%2 == 0) return 0; if (n%3 == 0) return 0; if (n%5 == 0) 4667 return 0; bound = root(n); for (i = 7; i <= bound; i = i+2) if (n % i == 0) return 0; 6932 return 1; } int root(int n) { return (int) sqrt( (float)n ); }

Adv. UNIX: Profile/1556 Some Observations  prime() is returning three less primes (6932) than it should –the error is the return statements for checking divisibility of 2, 3, 5 –instead of always returning 0, they should return 1 when n is 2, 3, or 5 continued

Adv. UNIX: Profile/1557 v The extra tests filter out many numbers (51,334, ~73% of range): –test for 2: 35,000 (half of input) –test for 3: 11,667 (1/3 of what's left) –test for 5: 4,667 (1/5 of what's left) v The for-loop executes about 55% less: –p4.c for-loop count: 767,154 –p3.c for-loop count: 1,682,490

Adv. UNIX: Profile/ Primes v.5 (p5.c)  This final version fixes the divisibility bugs in the p4.c code.  Also, the root() function is replaced by a much faster multiplication –there is no longer any need for the maths library

Adv. UNIX: Profile/ p5.c v v #include #define NUMCOLS 9 #define MAXPRIME int prime (int n); int main () { int i; int colCount = 0; :

Adv. UNIX: Profile/1560 for (i = 2; i <= MAXPRIME; i++) if (prime (i)) { colCount++; if (colCount%NUMCOLS == 0) { printf ("%5d\n", i); colCount = 0; } else printf ("%5d ", i); } putchar('\n'); return 0; } continued

Adv. UNIX: Profile/1561 int prime (int n) { int i; if (n%2 == 0) return (n == 2); if (n%3 == 0) return (n == 3); if (n%5 == 0) return (n == 5); for (i = 7; i*i <= n; i = i+2) if (n % i == 0) return 0; return 1; }

Adv. UNIX: Profile/ p5.c Timings v v $ gcc -pg -o p5 p5.c $ p $ gprof -b p5 Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name prime main All the primes are printed again. continued

Adv. UNIX: Profile/1563 v v Call graph granularity: each sample hit covers 4 byte(s) for 8.33% of 0.12 seconds index % time self children called name [1] main[1] /69999 prime[2] /69999 main [1] [2] prime[2] Index by function name [1] main [2] prime

Adv. UNIX: Profile/1564 Some Observations  p5.c is a tiny bit faster than p4.c : –0.12 secs compared to 0.13 secs

Adv. UNIX: Profile/ p5.c Line Counts v v $ gcc -fprofile-arcs -ftest-coverage -o p5 p5.c $ p5 > /dev/null $ gcov p % of 26 source lines executed in file p5.c Creating p5.c.gcov. continued

Adv. UNIX: Profile/1566 v v $ cat p5.c.gcov #include #define NUMCOLS 9 #define MAXPRIME int prime (int n); int main () 1 { int i; 1 int colCount = 0; :

Adv. UNIX: Profile/ for (i = 2; i <= MAXPRIME; i++) if (prime (i)) { 6935 colCount++; 6935 if (colCount%NUMCOLS == 0) { 770 printf ("%5d\n", i); 770 colCount = 0; 770 } else 6165 printf ("%5d ", i); } 1 putchar('\n'); 1 return 0; 1 } the right number of primes continued

Adv. UNIX: Profile/1568 int prime (int n) { int i; if (n%2 == 0) return (n == 2); if (n%3 == 0) return (n == 3); if (n%5 == 0) 4667 return (n == 5); for (i = 7; i*i <= n; i = i+2) if (n % i == 0) return 0; 6932 return 1; }

Adv. UNIX: Profile/1569 Some Observations  The divisibility tests and for-loop in prime() work as in p4.c –the tests filter out about 73% of the numbers –the for-loop executes about 55% less than in p3.c

Adv. UNIX: Profile/ Care with Timings v The five primes programs have total execution times: –ProgramTime (secs)Speed up p1.c p2.c 0.56 ~50 times p3.c 0.15 ~4 times p4.c 0.13 ~1.2 times p5.c 0.12 ~1.1 times continued

Adv. UNIX: Profile/1571 v What can be concluded? –the optimisations speed things up, but there is a trade-off of speed versus coding complexity v The figures are only based on for one run of the programs –the programs should be run many times, and averages taken continued

Adv. UNIX: Profile/1572 v The optimisation techniques should be tested on a range of inputs –in this case, we should run the programs for different ranges, not just 2..70,000 v Timing values are affected by the machine load, so timings should be collected at different times of day/night. continued

Adv. UNIX: Profile/1573 v Timings are affected by the machine type –e.g. SparcStation, Pentium 100 v Timings are affected by the OS version –e.g. BSD, Solaris, Linux –e.g older UNIXes implemented the maths library differently continued

Adv. UNIX: Profile/1574 v Timings are affected by the accuracy of the clock: –very small execution times will be measured as 0.00 secs –the execution times could be important for larger/different data

Adv. UNIX: Profile/ Quick Timings v To obtain the total CPU time used by the program, add: –printf("%f secs running\n", (double) (clock()/CLOCKS_PER_SEC) ); at the end of the program. continued

Adv. UNIX: Profile/1576  Bash contains a time command: $ time p2 /* p2's output */ real 0m1.366s user 0m1.360s sys 0m0.000s $ continued total elapsed time (wall clock time) user CPU time: time executing user code (and parts of libraries) system CPU time: time spent inside the UNIX kernel

Adv. UNIX: Profile/1577  There is a more detailed time command, which also prints information on other system resources: $ /usr/bin/time p2 /* p2's output */ 1.40user 0.02system 0:01.80elapsed 78%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (101major+12minor)pagefaults 0swaps $

Adv. UNIX: Profile/ Function Call Trees v A function call tree shows which functions call others –good for showing large program structure –shows what functions will be affected by a change in a function v An inverted flow graph is a related idea –for each function, it shows what other functions call it

Adv. UNIX: Profile/1579 Example: curl.c v $ cflow curl.c 1 main {curl.c 429} 2 init_strs {curl.c 3214} 3 strsbld {curl.c 3155} 4 strs_hash {curl.c 3160} 5 tolower {} 6 strs_insert {curl.c 3172} 7 strcmp {} 8 mk_strs {curl.c 3200} 9 malloc {} 10 strlen {} 11 strcpy {} 12 process_args {curl.c 486} 13 fopen {} : calls

Adv. UNIX: Profile/1580 Inverted Flow Graph for curl.c v $ cflow -i curl.c 1 _IO_getc {} 2 get_gifnm {curl.c 1600} 3 get_link {curl.c 1562} 4 mod_newfile {curl.c 3081} 5 mod_picfile {curl.c 2984} : 14 a_or_p {curl.c 2774} 15 delete_lineref {curl.c 2601} 16 add_path {curl.c 896} 17 announce_old {curl.c 2217} 18 nodup_kids {curl.c 1108} 19 str_add_path {curl.c 905} 20 add_path_html {curl.c 917} 21 next_cnt {curl.c 1920} : is called by