Profiling: Software Performance Suyog K Sethia 1 Profiling : Software Performance.

Slides:



Advertisements
Similar presentations
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Advertisements

Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
ECE 720T5 Fall 2012 Cyber-Physical Systems Rodolfo Pellizzoni.
ABCD: Eliminating Array-Bounds Checks on Demand Rastislav Bodík Rajiv Gupta Vivek Sarkar U of Wisconsin U of Arizona IBM TJ Watson recent experiments.
Bug Isolation via Remote Program Sampling Ben Liblit, Alex Aiken, Alice X.Zheng, Michael I.Jordan Presented by: Xia Cheng.
CS752 Decoupled Architecture for Data Prefetching Jichuan Chang Kai Xu.
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
Path Profile Estimation and Superblock Formation Jeff Pang Jimeng Sun.
Previous finals up on the web page use them as practice problems look at them early.
Phase-Based Program Sampling Using Phoenix Chandra Krintz University of California, Santa Barbara Microsoft Faculty Summit July, 2005.
Dynamic Tainting for Deployed Java Programs Du Li Advisor: Witawas Srisa-an University of Nebraska-Lincoln 1.
Incremental Path Profiling Kevin Bierhoff and Laura Hiatt Path ProfilingIncremental ApproachExperimental Results Path profiling counts how often each path.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Variational Path Profiling Erez Perelman*, Trishul Chilimbi †, Brad Calder* * University of Califonia, San Diego †Microsoft Research, Redmond.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
COP4020 Programming Languages
Fast Dynamic Binary Translation for the Kernel Piyus Kedia and Sorav Bansal IIT Delhi.
Adaptive Optimization in the Jalapeño JVM M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney Presented by Andrew Cove Spring 2006.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University.
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
JOP: A Java Optimized Processor for Embedded Real-Time Systems Martin Schöberl.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
WCET Analysis for a Java Processor Martin Schoeberl TU Vienna, Austria Rasmus Pedersen CBS, Denmark.
Adaptive Optimization in the Jalapeño JVM Matthew Arnold Stephen Fink David Grove Michael Hind Peter F. Sweeney Source: UIUC.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
Bug Localization with Machine Learning Techniques Wujie Zheng
Oct Using Platform-Specific Performance Counters for Dynamic Compilation Florian Schneider and Thomas Gross ETH Zurich.
P ath & E dge P rofiling Michael Bond, UT Austin Kathryn McKinley, UT Austin Continuous Presented by: Yingyi Bu.
Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO, and THOMAS ANDERSON Ethan.
Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
Scheduling policies for real- time embedded systems.
Smita Vijayakumar Qian Zhu Gagan Agrawal 1.  Background  Data Streams  Virtualization  Dynamic Resource Allocation  Accuracy Adaptation  Research.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
ABCD: Eliminating Array-Bounds Checks on Demand Rastislav Bodík Rajiv Gupta Vivek Sarkar U of Wisconsin U of Arizona IBM TJ Watson recent experiments.
Targeted Path Profiling : Lower Overhead Path Profiling for Staged Dynamic Optimization Systems Rahul Joshi, UIUC Michael Bond*, UT Austin Craig Zilles,
Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.
Bug Isolation via Remote Sampling. Lemonade from Lemons Bugs manifest themselves every where in deployed systems. Each manifestation gives us the chance.
Practical Path Profiling for Dynamic Optimizers Michael Bond, UT Austin Kathryn McKinley, UT Austin.
Targeted Path Profiling : Lower Overhead Path Profiling for Staged Dynamic Optimization Systems Rahul Joshi, UIUC Michael Bond*, UT Austin Craig Zilles,
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Full and Para Virtualization
ECE 720T5 Fall 2011 Cyber-Physical Systems Rodolfo Pellizzoni.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Method Profiling John Cavazos University.
Best detection scheme achieves 100% hit detection with
1 ROGUE Dynamic Optimization Framework Using Pin Vijay Janapa Reddi PhD. Candidate - Electrical And Computer Engineering University of Colorado at Boulder.
Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)
*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Performance Monitoring.
Profiling/Tracing Method and Tool Evaluation Strategy Summary Slides Hung-Hsun Su UPC Group, HCS lab 1/25/2005.
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Processes and threads.
Lazy Preemption to Enable Path-Based Analysis of Interrupt-Driven Code
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Gift Nyikayaramba 30 September 2014
Effective Data-Race Detection for the Kernel
Online Subpath Profiling
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Capriccio – A Thread Model
CSCI1600: Embedded and Real Time Software
Evaluation and Validation
Adaptive Code Unloading for Resource-Constrained JVMs
Correcting the Dynamic Call Graph Using Control Flow Constraints
Smita Vijayakumar Qian Zhu Gagan Agrawal
Co-designed Virtual Machines for Reliable Computer Systems
CSCI1600: Embedded and Real Time Software
Rohan Yadav and Charles Yuan (rohany) (chenhuiy)
Presentation transcript:

Profiling: Software Performance Suyog K Sethia 1 Profiling : Software Performance

Application Application Server Motivation Profiling : Software Performance2 Virtual Machine Operating System Hardware Application: Number of features and usage scenarios. Large blocks of code Hardware: Heterogeneous Pre-fetching, caching

Motivation To capture runtime-behavior: Static analysis: - fails. Why ? - Software and Hardware are complex (solve complex problems). - Software-Hardware are difficult to comprehend. - Application usage scenario is hard to predict. Dynamic analysis: based on run-time information Profiling : Software Performance3

Profiling Profiling: Analysis of program behavior based on run-time data Profiler: conceptual module whose purpose is to collect or analyse runtime data. Profile: a set of frequencies associated with run-time events Profile usage examples: - Debugging and bug-isolation. - Feedback based optimization. - Coverage testing. - Understanding program/architecture interaction. Profiling : Software Performance4

Profiling Qualities of a profiler desired: – Accurate – Low-overhead – Fast convergence – Flexible – Portable – Transparent – Low storage overhead Profiling : Software Performance5

Outline Types of profiles Profiling techniques and implementations Hybrid profiling systems Software profiling systems Path profiling Profiling : Software Performance6

Types of profiles Point profile - Events are simple - Events are independent Example: basic block, cache-misses etc. Context profile - Events are composed of simpler ordered events Example: call-context : sequence of method invocations execution paths : sequence of edges in a CFG Profiling : Software Performance7

Profiling techniques Profiling : Software Performance8 Exhaustive instrumentation Instrumentation sampling, temporary instrumentation, … Sampling ACCURACYACCURACY LIGHTWEIGHTLIGHTWEIGHT

Implementation of Profiler Hybrid (HW-assisted) 1. Hardware Performance Monitors (HPMs) 2. Dedicated HW collectors that deliver data to SW module  Fixed, low-overhead Software – Pure software implementations  Portable, flexible, high-overhead Profiling : Software Performance9

Outline Types of profiles Profiling techniques and implementations Hybrid profiling systems Software profiling systems Path profiling Profiling : Software Performance10

Hybrid profiling systems HPM-based system-wide profilers – Event-based sampling – Support multiple events – Profile unmodified binaries – Low overhead – May be used in production environment Profiling : Software Performance11

Hybrid profiling systems HPM-sampling for dynamic compilation – HPM-sampling for JikesRVM More accurate, converges faster, less overhead – Speed-up by % Online optimizations using HPMs – Cache-misses profiling – Co-allocation of objects – O.H. < 1%, speed-up 14% Profiling : Software Performance12 Class A Class B parentchild

Samples Hasher Hybrid profiling systems Rapid profiling via stratified sampling – Data stream is divided into disjoint strata – Reduces size of output stream and improves accuracy – O.H. 4.5%, accuracy 97% Profiling : Software Performance13 P Events stream Periodic Sampler P P P Stratified Sampling Samples..

Hybrid profiling systems Phase-aware profiling – Programs exhibit repeating patterns of execution (phases) – Profiling a representative of each phase approximates full profile – Phase-change detector is in HW – O.H. reduction by 58% over periodic sampling, accuracy 95% Profiling : Software Performance14

Outline Types of profiles Profiling techniques and implementations Hybrid profiling systems Software profiling systems Path profiling Profiling : Software Performance15

Software profiling systems Dynamic call tree – Fully describes method invocation during execution Calling context tree (CCT) – Aggregates calls with same context Profiling : Software Performance16 M DA CACB AD M AA BBCACBB Dynamic call tree Calling context tree CCC D

Software profiling systems Calling-context profiling – Goal: build an approximate calling context tree (CCT) – Expensive to collect via instrumentation Profiling : Software Performance17 M DAAA BBCACBB Dynamic call tree CCC D M AD CA Approximate CCT

Software profiling systems Sampling-based approach – Builds a partial call-context tree (PCCT) – Walks the stack on each interrupt up to k frames – Overhead 2-4%, Precision more than 90% – Disadv: Precision is the correlation of PCCTs for different runs of benchmark (not accurate) Relies on O.S. timer interrupt (Low frequency, slower on faster architectures) Profiling : Software Performance18

Software profiling systems Approximating CCT – Uses event-based sampling (method counter) – No bound on stack depth sampled – High accuracy, O.H. not analyzed (expected to be high) Timer/event-based sampling – Stride-enabled (mix of timer-based and event-based sampling), O.H. < 0.3%, Accuracy ~ L 60% Probabilistic calling context (PCC) – Usage: Residual testing, debugging, anomaly detection – Instrumentation-based (Sampling is not suitable) – No CCT is maintained => O.H. 3% Profiling : Software Performance19

Outline Types of profiles Profiling techniques and implementations Hybrid profiling systems Software profiling systems Path profiling Profiling : Software Performance20

Path profiling Frequency of execution paths in CFG Exponential in the number of CFG nodes Hard to collect by sampling Path profile is NOT edge profile Profiling : Software Performance21

Path profiling Efficient path profiling – Compact enumeration of path (Path IDs) – O.H. average 31%, maximum 97% – Disadv: High overhead, suitable only for offline setting Profiling : Software Performance22

Path profiling Online hot path prediction scheme – Only path head is instrumented instead of full path – Next Executing Trail (NET) predicted as hot – Hit rate average: 97% when 10% of execution profiled – Disadv: Cannot distinguish hot from warm paths (false positives) Profiling : Software Performance23

Path profiling Selective path profiling (SPP) – Instrument only paths of interest – Usage: Residual testing, profiling different subsets over multiple copies of deployed software – Disadv: the set of path is not totally arbitrary Preferential path profiling – Similar to SPP but paths can be specified arbitrarily – O.H. average 15% Profiling : Software Performance24

Path profiling Variational path profiling – Offline profiling scheme – Collects execution time variability for paths – Paths with high variability are good candidates for optimization – O.H. average: 5%, speedup via simple optimizations: 8.5% – Disadv: Doesn’t report constantly slow paths. Profiling : Software Performance25

Path profiling From edge profile we can find – total flow (frequency) – upper bound on the flow of each paths – lower bound on the flow of each path (definite flow) Profiling : Software Performance26 Consider ABDEG: Max flow = 50 Let ACDFG = 0, ACDEG = 30, ABDFG = 20 -> Min flow = 30 (Definite Flow) A C D B EF G

Path profiling Profiling : Software Performance27

Path profiling Targeted path profiling (TPP) – Suitable for staged dynamic optimizers – Relies on edge profile to: Remove obvious paths Remove cold edges – O.H. 14%, Acc > 97% – Disadv: Compilation O.H. 74% Profiling : Software Performance28

Path profiling Practical path profiling (PPP) – Reduces amount of instrumentation than TPP – Reduce instrumentation overhead Smart path numbering (avoid instrumenting hot edges) – O.H. average 5%, accuracy average 96% – Disadv: Compilation overhead not reported (expected to be high) Profiling : Software Performance29

Path profiling Two types of instrumentation: Profiling : Software Performance30

Path profiling Path and edge profiling (PEP) [Bond’05] – Orthogonal approach that samples profile updates – Piggybacks on JikesRVM sampling profiler – Edge profile guides instrumentation placing – O.H. average 1.2%, 94% accuracy – Disadv: VM-specific Profiling : Software Performance31

Conclusion Programs behavior are hard to understand statically Dynamic analysis based on runtime data is needed Performance profiling investigates runtime behavior Profiling techniques range from exhaustive instrumentation to sampling Implementation can be in hardware, software or hybrid Context and path profiling techniques Profiling : Software Performance32

Reducing profiling cost Ephemeral instrumentation – Temporary instrumentation Shadow profiling – O.H. < 1%, Accuracy 94% (value-profiling) Profile over adaptive ranges [Mysore ’06] – Adaptively reducing profile storage overhead Profiling : Software Performance33

Reducing profiling cost Profiling : Software Performance34

Usage examples Value profiling and optimization Code coverage testing Bug isolation Understand OO applications Offline/Online feed-back directed optimization Profiling : Software Performance35

Path profiling usage example Profiling : Software Performance36

Sampling: Triggering mechanism Timer-based – Easy to implement, Relies on O.S. timer interrupt (e.g. 4 ms on Linux 2.6 = 250 samples/sec) – Disadv.: Low sampling frequency, independent of processor speed, cannot always correlate with program events Event-based – Relies on event counting in SW or HW (e.g. HPMs) – Correlates with program events – Adapts to processor speed Profiling : Software Performance37

Sampling: Collection mechanism Polling-based – Samples are taken only at polling points (e.g. method entries, back-edges, … etc. ) – Disadv.: Not timely Overhead: flag checking Limited accuracy: biased sampling (e.g. polling points after long I/O operations) Immediate – Sample is taken right after interrupt is raised Profiling : Software Performance38

Sampling: Collection mechanism(2) Profiling : Software Performance39 Problem with polling sampling Oversampled polling point Arnold Groove Sampling [Arnold ‘05] Timer interrupt Sample Burst = 3, Stride = 3 time

References %29 % performance%20profiling&um=1&ie=UTF-8&sa=N&tab=ws performance%20profiling&um=1&ie=UTF-8&sa=N&tab=ws mance+profiling&btnG=Search&as_sdt= &as_ylo=& as_vis=0 mance+profiling&btnG=Search&as_sdt= &as_ylo=& as_vis=0 Profiling : Software Performance40

Thank You Profiling : Software Performance41