July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.

Slides:



Advertisements
Similar presentations
Code Tuning Strategies and Techniques
Advertisements

Computer Science 2212a/b - UWO1 Structural Testing Motivation The gcov Tool An example using gcov How does gcov do it gcov subtleties Further structural.
GNU gprof Profiler Yu Kai Hong Department of Mathematics National Taiwan University July 19, 2008 GNU gprof 1/22.
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Tools for applications improvement George Bosilca.
Algorithm Analysis (Big O) CS-341 Dick Steflik. Complexity In examining algorithm efficiency we must understand the idea of complexity –Space complexity.
1 Lecture 6 Performance Measurement and Improvement.
Incremental Path Profiling Kevin Bierhoff and Laura Hiatt Path ProfilingIncremental ApproachExperimental Results Path profiling counts how often each path.
27-Jun-15 Profiling code, Timing Methods. Optimization Optimization is the process of making a program as fast (or as small) as possible Here’s what the.
Performance Improvement
CSc 352 Performance Tuning Saumya Debray Dept. of Computer Science The University of Arizona, Tucson
1 1 Profiling & Optimization David Geldreich (DREAM)
Spring 2014 SILICON VALLEY UNIVERSITY CONFIDENTIAL 1 Introduction to Embedded Systems Dr. Jerry Shiao, Silicon Valley University.
With RTAI, MPICH2, MPE, Jumpshot, Sar and hopefully soon OProfile or VTune Dawn Nelson CSC523.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
Tuning Tophat2 Belinda Giardine. Tophat2 Aligns reads from RNA to the genome Ribonucleic acid (RNA) is a ubiquitous family of large biological molecules.
Multi-core Programming VTune Analyzer Basics. 2 Basics of VTune™ Performance Analyzer Topics What is the VTune™ Performance Analyzer? Performance tuning.
GNU gcov (1/4) [from Wikipedia] gcov is a source code coverage analysis and statement- by-statement profiling tool. gcov generates exact counts of the.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
Timing and Profiling ECE 454 Computer Systems Programming Topics: Measuring and Profiling Cristiana Amza.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.
1 Components of the Virtual Memory System  Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk.
Application Profiling Using gprof. What is profiling? Allows you to learn:  where your program is spending its time  what functions called what other.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
CS 591x Profiling Parallel Programs Using the Portland Group Profiler.
Debugging and Profiling With some help from Software Carpentry resources.
Timing Programs and Performance Analysis Tools for Analysing and Optimising advanced Simulations.
CS 3500 L Performance l Code Complete 2 – Chapters 25/26 and Chapter 7 of K&P l Compare today to 44 years ago – The Burroughs B1700 – circa 1974.
Adv. UNIX: Profile/151 Advanced UNIX v Objectives –introduce profiling based on execution times and line counts Special Topics in Comp.
1 Announcements  Homework 4 out today  Dec 7 th is the last day you can turn in Lab 4 and HW4, so plan ahead.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
Introduction to ECE 454 Computer Systems Programming Topics: Lecture topics and assignments Profiling rudiments Lab schedule and rationale Cristiana Amza.
Outline Announcements: –HW III due Friday! –HW II returned soon Software performance Architecture & performance Measuring performance.
Performance Improvement
Profiling Tools Introduction to Computer System, Fall (PPI, FDU) Vtune & GProfile.
Chapter 1 Introduction. Chapter 1 -- Introduction2  Def: Compiler --  a program that translates a program written in a language like Pascal, C, PL/I,
1 Performance Issues CIS*2450 Advanced Programming Concepts.
Searching Topics Sequential Search Binary Search.
Copyright 2014 – Noah Mendelsohn Performance Analysis Tools Noah Mendelsohn Tufts University Web:
Performance* Objective: To learn when and how to optimize the performance of a program. “ The first principle of optimization is don ’ t. ” –Knowing how.
Announcements You will receive your scores back for Assignment 2 this week. You will have an opportunity to correct your code and resubmit it for partial.
Outline Announcements: –HW II Idue Friday! Validating Model Problem Software performance Measuring performance Improving performance.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
Measuring Performance Based on slides by Henri Casanova.
Two notions of performance
Profiling with GNU GProf
Measuring Where CPU Time Goes
Atomic Operations in Hardware
Chapter 8 Arrays Objectives
CS2100 Computer Organisation
CSCI1600: Embedded and Real Time Software
Big-Oh and Execution Time: A Review
Algorithm Efficiency, Big O Notation, and Javadoc
GNU gcov (1/4) [from Wikipedia]
CSc 352: Testing and Code Coverage
GNU gcov gcov is a test coverage program running with gcc.
GNU gcov (1/4) [from Wikipedia]
Min Heap Update E.g. remove smallest item 1. Pop off top (smallest) 3
Data Structures Introduction
Chapter 1 Introduction.
slides created by Ethan Apter
Complexity Based on slides by Ethan Apter
slides created by Ethan Apter
C. M. Overstreet Old Dominion University Fall 2005
CSCI1600: Embedded and Real Time Software
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
What Are Performance Counters?
CSc 352 Performance Tuning
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
CS2100 Computer Organisation
Presentation transcript:

July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between HLL and assembly code 2.Processor design, pipelining, and performance 3.Memory systems, caches, virtual memory, I/O, and ECC The next major topic is: performance tuning  How can I, as a programmer, make my programs run fast?  The first step is figuring out where/why the program is slow? —Program profiling  How does one go about optimizing a program? —Use better algorithms (do this first!) —Exploit the processor better (3 ways) 1.Write hand-tuned assembly versions of hot spots 2.Getting more done with every instruction 3.Using more than one processor

July 10, 2016ISA's, Compilers, and Assembly2 Performance Optimization  Until you are an expert, first write a working version of the program  Then, and only then, begin tuning, first collecting data, and iterate —Otherwise, you will likely optimize what doesn’t matter “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” -- Sir Tony Hoare

July 10, 2016ISA's, Compilers, and Assembly3 Building a benchmark  You need something to gauge your progress. —Should be representative of how the program will be used

July 10, 2016ISA's, Compilers, and Assembly4 Instrumenting your program  We can do this by hand. Consider: test.c --> test2.c —Let’s us know where the program is spending its time. —But implementing it is tedious; consider instrumenting 130k lines of code

July 10, 2016ISA's, Compilers, and Assembly5 Using tools to do instrumentation  Two GNU tools integrated into the GCC C compiler  Gprof: The GNU profiler —Compile with the -pg flag This flag causes gcc to keep track of which pieces of source code correspond to which chunks of object code and links in a profiling signal handler. —Run as normal; program requests the operating system to periodically send it signals; the signal handler records what instruction was executing when the signal was received in a file called gmon.out —Display results using gprof command Shows how much time is being spent in each function. Shows the calling context (the path of function calls) to the hot spot.

July 10, 2016ISA's, Compilers, and Assembly6 Example gprof output Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name cache_access sim_main update_way_list dl1_access_fn dl2_access_fn yylex Over 80% of time spent in one function index % time self children called name /1 main [2] [1] sim_main [1] / cache_access [4] /10 sys_syscall [9] /2967 mem_translate [16] /2824 mem_newpage [18] Provides calling context ( main calls sim_main calls cache_access ) of hot spot

July 10, 2016ISA's, Compilers, and Assembly7 Using tools for instrumentation (cont.)  Gprof didn’t give us information on where in the function we were spending time. ( cache_access is a big function; still needle in haystack)  Gcov: the GNU coverage tool —Compile/link with the -fprofile-arcs -ftest-coverage options Adds code during compilation to add counters to every control flow edge (much like our by hand instrumentation) to compute how frequently each block of code gets executed. —Run as normal —For each xyz.c file an xyz.gdna and xyz.gcno file are generated —Post-process with gcov xyz.c Computes execution frequency of each line of code Marks with ##### any lines not executed  Useful for making sure that you tested your whole program

July 10, 2016ISA's, Compilers, and Assembly8 Example gcov output : 540: if (cp->hsize) { #####: 541: int hindex = CACHE_HASH(cp, tag); -: 542: #####: 543: for (blk=cp->sets[set].hash[hindex]; -: 544: blk; -: 545: blk=blk->hash_next) -: 546: { #####: 547: if (blk->tag == tag && (blk->status & CACHE_BLK_VALID)) #####: 548: goto cache_hit; -: 549: } -: 550: } else { -: 551: /* linear search the way list */ : 552: for (blk=cp->sets[set].way_head; -: 553: blk; -: 554: blk=blk->way_next) { : 555: if (blk->tag == tag && (blk->status & CACHE_BLK_VALID)) : 556: goto cache_hit; -: 557: } -: 558: } Loop executed over 50 interations on average ( / ) Code never executed

July 10, 2016ISA's, Compilers, and Assembly9 Conclusion  The second step to making a fast program is finding out why it is slow —The first step is making a working program —Your intuition where it is slow is probably wrong So don’t guess, collect data!  Many tools already exist for automatically instrumenting your code —Identify the “hot spots” in your code where time is being spent —Two example tools: Gprof: periodically interrupts program Gcov: inserts counters into code —We’ll see Vtune in section, which explains why the code is slow  If you’ve never tuned your program, there is probably “low hanging fruit” —Most of the time is spent in one or two functions —Try using better data structures (225) or algorithms (473) to speed these up