1 TCSS 360, Winter 2005 Lecture Notes Optimization and Profiling.

Slides:



Advertisements
Similar presentations
Eclipse TPTP TPTP Heap and Thread Profilers High-level Design Rev 1.0 Asaf Yaffe July, 2006.
Advertisements

SE-292 High Performance Computing Profiling and Performance R. Govindarajan
Instruction Set Design
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Information and Control in Gray-Box Systems Arpaci-Dusseau and Arpaci-Dusseau SOSP 18, 2001 John Otto Wi06 CS 395/495 Autonomic Computing Systems.
CompSci Applets & Video Games. CompSci Applets & Video Games The Plan  Applets  Demo on making and running a simple applet from scratch.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Race Conditions. Isolated & Non-Isolated Processes Isolated: Do not share state with other processes –The output of process is unaffected by run of other.
Memory Allocation. Three kinds of memory Fixed memory Stack memory Heap memory.
3/17/2008Prof. Hilfinger CS 164 Lecture 231 Run-time organization Lecture 23.
Chapter 14 Chapter 14: Server Monitoring and Optimization.
Previous finals up on the web page use them as practice problems look at them early.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
An Adaptive, Region-based Allocator for Java Feng Qian & Laurie Hendren 2002.
Code Generation CS 480. Can be complex To do a good job of teaching about code generation I could easily spend ten weeks But, don’t have ten weeks, so.
1 Software Testing and Quality Assurance Lecture 31 – SWE 205 Course Objective: Basics of Programming Languages & Software Construction Techniques.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Page 1 © 2001 Hewlett-Packard Company Tools for Measuring System and Application Performance Introduction GlancePlus Introduction Glance Motif Glance Character.
1 1 Profiling & Optimization David Geldreich (DREAM)
1 CSE 403 Performance Profiling These lecture slides are copyright (C) Marty Stepp, They may not be rehosted, sold, or modified without expressed.
1 TCSS 360, Spring 2005 Lecture Notes Performance Testing: Optimization and Profiling.
Embedded Java Research Geoffrey Beers Peter Jantz December 18, 2001.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
Applets & Video Games 1 Last Edited 1/10/04CPS4: Java for Video Games Applets &
University of Maryland parseThat: A Robust Arbitrary-Binary Tester for Dyninst Ray Chen.
Multi-core Programming VTune Analyzer Basics. 2 Basics of VTune™ Performance Analyzer Topics What is the VTune™ Performance Analyzer? Performance tuning.
- Tausief Shaikh (Senior Server developer). Introduction Covers sense of responsibility towards Project development in IT Focusing on memory and CPU utilizations.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Lecture 10 : Introduction to Java Virtual Machine
CSc 453 Runtime Environments Saumya Debray The University of Arizona Tucson.
Chapter 3.5 Memory and I/O Systems. 2 Memory Management Memory problems are one of the leading causes of bugs in programs (60-80%) MUCH worse in languages.
JBOSS Profiler Performance Profiling. Contents ● Motivation – The problem ● Profiling ● Profiling Tools ● Java and Profiling ● JBoss Profiler ● Example.
1 CSE 403 Performance Testing, Profiling, Optimization Reading: McConnell, Code Complete, Ch These lecture slides are copyright (C) Marty Stepp,
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Lecture 8 February 29, Topics Questions about Exercise 4, due Thursday? Object Based Programming (Chapter 8) –Basic Principles –Methods –Fields.
Application Profiling Using gprof. What is profiling? Allows you to learn:  where your program is spending its time  what functions called what other.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
CS 2130 Lecture 5 Storage Classes Scope. C Programming C is not just another programming language C was designed for systems programming like writing.
Eclipse Simple Profiler Ben Xu Mar 7,2011. About Eclipse simple profiler is a open source project to analyze your plug-ins/RCPs performance.
CSE 403 Lecture 18 Performance Testing Reading: Code Complete, Ch (McConnell) slides created by Marty Stepp
Performance Analysis & Code Profiling It’s 2:00AM -- do you know where your program counter is?
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Processes CS 6560: Operating Systems Design. 2 Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
Core Java Introduction Byju Veedu Ness Technologies httpdownload.oracle.com/javase/tutorial/getStarted/intro/definition.html.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Lecture #20: Profiling NetBeans Profiler 6.0.
Profile, HAT, Wireless Toolkit’s Profile Sookmyung Women’s Univ. PSLAB Choi yoonjeong.
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Java Flight Recorder and Java Mission Control
Lecture 1b- Introduction
Chapter 3: Process Concept
Cork: Dynamic Memory Leak Detection with Garbage Collection
Java programming lecture one
Introduction Enosis Learning.
May 23-24, 2012 Microsoft.
Introduction Enosis Learning.
PerfView Measure and Improve Your App’s Performance for Free
ColdFusion Performance Troubleshooting and Tuning
Adaptive Code Unloading for Resource-Constrained JVMs
Interpreting Java Program Runtimes
M S COLLEGE ART’S, COMM., SCI. & BMS
Java Virtual Machine Profiling. Agenda Introduction JVM overview Performance concepts Monitoring Profiling VisualVM demo Tuning Conclusions.
Presentation transcript:

1 TCSS 360, Winter 2005 Lecture Notes Optimization and Profiling

2 Optimization metrics Runtime / CPU Usage What lines of code the program is spending the most time in What call/invocation paths were used to get to these lines Naturally represented as tree structures Memory Usage What kinds of objects are sitting on the heap Where were they allocated Who is pointing to them now "Memory leaks" A huge graph

3 Profiling profiling: measuring relative statistics on your application where is the most time being spent? ("classical" profiling) which method takes the most time? which method is called the most? how is memory being used? what kind of objects are being created? this in especially applicable in OO, GCed environments profiling is not the same as benchmarking or optimizing! benchmarking: measuring the performance of your app on a particular platform optimization: applying refactoring and enhancements to speed up code

4 Profiling motivation Why use a profiler? your intuition about what's slow is often wrong performance is a major aspect of program acceptance Profiler advantages: accuracy completeness solid, statistical information When should I profile my code?

5 What profiling tells you Basic information: How much time is spent in each method? ("flat" profiling) How many objects of each type are allocated? Beyond the basics: Program flow ("hierarchical" profiling) do calls to method A cause method B to take too much time? Per-line information which line(s) in a given method are the most expensive? Which methods created which objects? Visualization aspects Is it easy to use the profiler to get to the information you're interested in?

6 Types of profilers There are a variety of ways that profilers work There is usually a trade-off in terms of: accuracy speed granularity of information intrusiveness

7 Insertion profiling insertion: placing special profiling measurement code into your program automatically at compile-time pros: can be used across a variety of platforms very accurate (in some ways) cons: requires recompilation or relinking of the app profiling code may affect performance difficult to calculate exact impact

8 Sampling profiling sampling: monitoring the processor or VM at regular intervals and saving a snapshot of the processor and/or memory state this data is then compared with the program's layout in memory to get an idea of where the program was at each sample pros: no modification of app is necessary cons: less accurate; varying sample interval leads to a time/accuracy trade-off a high sample rate is accurate, but takes a lot of time very small methods will almost always be missed if a small method is called frequently and you have are unlucky, small but expensive methods may never show up sampling cannot easily monitor memory usage

9 Instrumented VM profiling instrumenting the Java VM: modifying the Java Virtual Machine's code so that it records profiling information Using this technique each and every VM instruction can be monitored Pros: The most accurate technique Can monitor memory usage data as well as time data Can easily be extended to allow remote profiling Cons: The instrumented VM is platform-specific

10 Garbage collection A memory manager that reclaims objects that are not reachable from a root-set root set: all objects with an immediate reference all local reference variables in each frame of every thread's stack all static reference fields in all loaded classes JNI references to Java objects stored in the C heap

11 Java's memory model

12 Java's memory model 2

13 Heap and garbage collection Size Heap Size Time Max Occupancy GC Total size of reachable objects Total size of allocated objects

14 Optimization tools Many free Java optimization tools available: Extensible Java Profiler (EJP) - open source, CPU tracing only Eclipse Profiler plugin Java Memory Profiler (JMP) Mike's Java Profiler (MJP) JProbe Profiler - uses an instrumented VM hprof ( java -Xrunhprof ) Comes with JDK from Sun, free Good enough for anything I've ever needed

15 Xprof But first, a very simple tool: java -Xprof MyClass Not helpful Flat profile of secs (11807 total ticks): main Interpreted + native Method 0.5% java.io.FileInputStream.readBytes 0.1% java.io.UnixFileSystem.getBooleanAttributes0 0.1% java.io.FileOutputStream.writeBytes 1.5% Total interpreted (including elided) Compiled + native Method 43.7% java.util.HashMap.get 10.5% edu.stanford.nlp.ie.BigramSequenceModel. conditionalProbOfWord 6.1% java.util.HashMap.hash 5.2% edu.stanford.nlp.util.GeneralizedCounter. getCount 4.6% edu.stanford.nlp.util.Counter.getCount

16 Using hprof usage: java -Xrunhprof:[help]|[ =,...] Option Name and Value Description Default heap=dump|sites|all heap profiling all cpu=samples|times|old CPU usage off monitor=y|n monitor contention n format=a|b text(txt) or binary output a file= write data to file off depth= stack trace depth 4 interval= sample interval in ms 10 cutoff= output cutoff point lineno=y|n line number in traces? Y thread=y|n thread in traces? N doe=y|n dump on exit? Y msa=y|n Solaris micro state accounting n force=y|n force output to y verbose=y|n print messages about dumps y

17 Sample hprof usage To optimize CPU usage, try the following: java -Xrunhprof:cpu=samples,depth=6,heap=sites Settings: Takes samples of CPU execution Record call traces that include the last 6 levels on the stack Only record the sites used on the heap (to keep the output file small) After execution, open the text file java.hprof.txt in the current directory with a text editor

18 java.hprof.txt organization THREAD START/END: mark the lifetime of Java threads TRACE: represents a Java stack trace. Each trace consists of a series of stack frames. Other records refer to TRACEs to identify (1) where object allocations have taken place, (2) the frames in which GC roots were found, and (3) frequently executed methods. HEAP DUMP: a complete snapshot of all live objects in the heap. SITES: a sorted list of allocation sites. This identifies the most heavily allocated object types, and the TRACE at which those allocations occurred. CPU SAMPLES: a statistical profile of program execution. The VM periodically samples all running threads, and assign a quantum to active TRACEs in those threads. Entries in this record are TRACEs ranked by percentage. CPU TIME: a profile of program execution obtained by measuring the time spent in individual methods (excluding the time spent in callees), as well as by counting the number of times each method is called.

19 CPU samples Located at the bottom of the file Lists the same method multiple times if the traces are different The "trace" number references traces that are described in detail in the trace section CPU SAMPLES BEGIN (total = 3170) Mon Nov 22 09:10: rank self accum count trace method % 2.15% java.io.FileInputStream.readBytes % 3.91% java.util.AbstractList.iterator % 5.08% edu.stanford.nlp.parser. lexparser.ExhaustivePCFGParser. initializeChart % 6.18% java.util.AbstractList.iterator % 7.10% java.lang.String.substring % 7.92% java.lang.String.. CPU SAMPLES END

20 Stack traces. TRACE 1636: java.util.AbstractList.iterator(AbstractList.java:336) java.util.AbstractList.hashCode(AbstractList.java:626) java.util.HashMap.hash(HashMap.java:261) java.util.HashMap.put(HashMap.java:379) edu.stanford.nlp.util.Counter.incrementCount(Counter.java:199) edu.stanford.nlp.util.Counter.incrementCount(Counter.java:220). TRACE 1772: java.util.AbstractList.iterator(AbstractList.java:336) java.util.AbstractList.hashCode(AbstractList.java:626) java.util.HashMap.hash(HashMap.java:261) java.util.HashMap.get(HashMap.java:317) edu.stanford.nlp.util.Counter.getCount(Counter.java:114) edu.stanford.nlp.util.Counter.totalCount(Counter.java:75).

21 Heap sites Very limited information: how much space allocated for objects of each class, at what stack trace SITES BEGIN (ordered by live bytes) Mon Nov 22 09:10: percent live alloc'ed stack class rank self accum bytes objs bytes objs trace name % 20.70% [F % 22.33% edu.stanford.nlp.parser. lexparser.IntTaggedWord % 23.65% [C % 24.87% java.lang.Object % 25.99% java.util.HashMap$Entry % 27.02% java.util.HashMap$KeyIterator % 27.96% [C. SITES END

22 Visualization tools CPU samples critical to see the traces to modify code hard to read - far from the traces in the file good tool called PerfAnal to build and navigate the invocation tree HP's HPjmeter also analyzes java.hprof.txt visually Heap dump critical to see what objects are there, and who points to them very hard to navigate in a text file! good tool called HAT to navigate heap objects

23 PerfAnal helps you navigate the information contained in java.hprof.txt file creates 4 views: CPU inclusive by caller CPU inclusive by callee CPU inclusive by line number CPU exclusive Plus some thread stuff easy to use: download PerfAnal.jar, put in same folder as your program java -jar PerfAnal.jar./java.hprof.txt

24 Warnings CPU profiling really slows down your code! Design your profiling tests to be very short CPU Samples is better than CPU Time CPU samples don't measure everything Doesn't record object creation and garbage collection time, which can be significant! Output files are very large, especially if there is a heap dump

25 What to do with profiler results observe which methods are being called the most these may not necessarily be the "slowest" methods! observe which methods are taking the most time relative to the others common problem: inefficient unbuffered I/O common problem: poor choice of data structure common problem: recursion call overhead common problem: unnecessary re-computation of expensive information, or unnecessary multiple I/O of same data

26 What to do after optimization Sometimes performance bottlenecks exist in Sun's Java APIs, not in your code. What to do? if GUI related, maybe add code to inform the user that things are actually happening (ex. progress bars) can you use an alternative call, substitute another algorithm or component? maybe reduce the frequency of calls to a method are you using the framework / API effectively?