Performance Analysis & Code Profiling It’s 2:00AM -- do you know where your program counter is?

Slides:



Advertisements
Similar presentations
Introduction to Memory Management. 2 General Structure of Run-Time Memory.
Advertisements

ITEC 352 Lecture 25 Memory(3). Review Questions RAM –What is the difference between register memory, cache memory, and main memory? –What connects the.
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Hastings Purify: Fast Detection of Memory Leaks and Access Errors.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
1 Lecture 6 Performance Measurement and Improvement.
1 TCSS 360, Winter 2005 Lecture Notes Optimization and Profiling.
Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
Lecture 9: SHELL PROGRAMMING (continued) Creating shell scripts!
By: Bryan Dixon.  Performance  Explicit Memory Management  Fine-grained control over assembly level data representation.
Garbage Collection and High-Level Languages Programming Languages Fall 2003.
Page 1 © 2001 Hewlett-Packard Company Tools for Measuring System and Application Performance Introduction GlancePlus Introduction Glance Motif Glance Character.
© 2007 ADP 1 Java Capacity Planning & Performance Measurements Dr. Carl J. De Pasquale Electronic Numerical Integrator and Computer.
Page 1 © 2001 Hewlett-Packard Company Tools for Measuring System and Application Performance Introduction GlancePlus Introduction Glance Motif Glance Character.
CPU PROFILING FIND THE BOTTLENECK. WHAT? WHEN? HOW?
Embedded Java Research Geoffrey Beers Peter Jantz December 18, 2001.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
A Portable Virtual Machine for Program Debugging and Directing Camil Demetrescu University of Rome “La Sapienza” Irene Finocchi University of Rome “Tor.
CS 153 Design of Operating Systems Spring 2015
- Tausief Shaikh (Senior Server developer). Introduction Covers sense of responsibility towards Project development in IT Focusing on memory and CPU utilizations.
Bill Au CBS Interactive Troubleshooting Slow or Hung Java Applications.
Bill Au CBS Interactive Troubleshooting Slow or Hung Java Applications.
Ideas to Improve SharePoint Usage 4. What are these 4 Ideas? 1. 7 Steps to check SharePoint Health 2. Avoid common Deployment Mistakes 3. Analyze SharePoint.
Real-Time Java on JOP Martin Schöberl. Real-Time Java on JOP2 Overview RTSJ – why not Simple RT profile Scheduler implementation User defined scheduling.
1 Names, Scopes and Bindings Aaron Bloomfield CS 415 Fall
Chapter 3.5 Memory and I/O Systems. 2 Memory Management Memory problems are one of the leading causes of bugs in programs (60-80%) MUCH worse in languages.
Compiling and the Java Virtual Machine (JVM). The syntax of Pseudocode is pretty loose –visual validation encourages a permissive approach –emphasized.
Lecture 8 February 29, Topics Questions about Exercise 4, due Thursday? Object Based Programming (Chapter 8) –Basic Principles –Methods –Fields.
Application Profiling Using gprof. What is profiling? Allows you to learn:  where your program is spending its time  what functions called what other.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
CS 2130 Lecture 5 Storage Classes Scope. C Programming C is not just another programming language C was designed for systems programming like writing.
CS 3500 L Performance l Code Complete 2 – Chapters 25/26 and Chapter 7 of K&P l Compare today to 44 years ago – The Burroughs B1700 – circa 1974.
Profiling Where does my application spend the time? Profiling1.
Multithreaded Programing. Outline Overview of threads Threads Multithreaded Models  Many-to-One  One-to-One  Many-to-Many Thread Libraries  Pthread.
Slide Advanced Programming 2004, based on LY Stefanus's slides Native Methods.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
Threads. Readings r Silberschatz et al : Chapter 4.
Vertical Profiling : Understanding the Behavior of Object-Oriented Applications Sookmyung Women’s Univ. PsLab Sewon,Moon.
Copyright 2014 – Noah Mendelsohn Performance Analysis Tools Noah Mendelsohn Tufts University Web:
By Anand George SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org)
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Lecture #20: Profiling NetBeans Profiler 6.0.
Profile, HAT, Wireless Toolkit’s Profile Sookmyung Women’s Univ. PSLAB Choi yoonjeong.
Tuning Threaded Code with Intel® Parallel Amplifier.
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
Speculative execution Landon Cox April 13, Making disk accesses tolerable Basic idea Remove disk accesses from critical path Transform disk latencies.
July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.
Using the VTune Analyzer on Multithreaded Applications
Processes and threads.
Before You Begin Nahla Abuel-ola /WIT.
Debugging Memory Issues
Multi Threading.
Java Programming Language
Software Development with uMPS
May 23-24, 2012 Microsoft.
Visual Studio 2005 “Personalized productivity”
CSCI1600: Embedded and Real Time Software
PerfView Measure and Improve Your App’s Performance for Free
Adaptive Code Unloading for Resource-Constrained JVMs
Memory Allocation CS 217.
Tools.
Tools.
Speculative execution and storage
Compiler Construction
CSE 153 Design of Operating Systems Winter 2019
Interpreting Java Program Runtimes
CSCI1600: Embedded and Real Time Software
Run-time environments
CS Introduction to Operating Systems
Presentation transcript:

Performance Analysis & Code Profiling It’s 2:00AM -- do you know where your program counter is?

Code is tooooo slooooooowwwwww.... Real world: performance matters. Often an app killer. Perceived performance usually bottom line for consumer apps. “In academia, the constant in the O() is nothing. In industry, it’s the only thing.” Good algorithms/data structures are crucial starting points After that, different implementations can have huge impact

Where’s the bottleneck? Assuming same data structs/algs, why is one implementation slower than another? Where is code spending most of its time? Related question: where is all of the memory? Are objects going away when they should? Rule of thumb: 80/20 rule In most programs, 80% of the time is spent in 20% of the code Problem: humans are very bad at finding the 20%! Even worse at predicting where the 20% will be when writing/designing program!

Non-solutions Blame the language. Write in C/FORTRAN/etc. Some languages do impose runtime penalties Mostly, small compared to choice of alg, etc. Use foreign language calls (assembly, C, etc.) Sub-case of above Can be useful for critical chunks of code Still stuck with -- which chunks? (80/20) Micro-optimize while writing code Lot of pain Makes code hard to read/follow Typically doesn’t help

More non-solutions Wait for HW to get faster Solution used in practice :-P Encourages sloppy design/programming Some problems won’t go away with time Let the compiler handle it Good for micro-optimizations (esp. block local) Compiler is smarter than you, usually Doesn’t handle design choices, overuse of function calls, poor data structs, indescribable invariants, data-dependent performance, etc.

Watching the code run... Ultimate answer: look and see Instrument and monitor code; see where most time is being spent Must load program under real data conditions! You’ve already done some of this Program timing Counting get()/put()/remove() calls, etc. In principle, could get everything you need that way Massive pain in the rear... 80/20 rule strikes again

Profiling tools to the rescue... Automated instrumentation of code Use external tool to monitor execution Run under realistic conditions Post-mortem examine results for critical 20% Typically work by: Rewriting compiled executable (gcc -p) Monitoring the runtime system/JVM (java - Xrunhprof)

Hprof: the Java profiler Runs JVM in special mode; watches code as it runs. Tracks: Subroutine calls/stacks Object allocations Thread execution CPU usage Invoke with: java -Xrunhprof:[hprof opt list] ClassToProf Produces text summary of run, post-mortem Note: Javasoft demo tool; NOT professional quality, industrial strength tool (but free!)

Hprof options file=fname : set output/dump file cpu=samples|times : set profiling method for CPU utilization, method calls, stack trace, etc. heap=dump|sites|all : set tracing of heap (dynamically allocated) objects depth=# : set depth of stack traces to report (max # of nested calls) thread=y|n : report thread IDs? Example: java -Xrunhprof:file=hprof.txt,cpu=samples,depth=6 \ Analyzer -u 8 -a 4 -x 842 -r results.txt -m model.dat

Problems with hprof Unreadable output (ugh) Static analysis Only gives you snapshot of results at end of run Uses sampling Only checks state of JVM periodically Can miss very short/infrequent calls Doesn’t check many things Dynamic heap state; memory leaks (yes, even in Java) File I/O Strange data access patterns Multi-thread accesses