Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.

Slides:



Advertisements
Similar presentations
Extending Eclipse CDT for Remote Target Debugging Thomas Fletcher Director, Automotive Engineering Services QNX Software Systems.
Advertisements

SE-292 High Performance Computing Profiling and Performance R. Govindarajan
IGOR: A System for Program Debugging via Reversible Execution Stuart I. Feldman Channing B. Brown slides made by Qing Zhang.
Systems Software.
Last update: August 9, 2002 CodeTest Embedded Software Verification Tools By Advanced Microsystems Corporation.
Chapter 6 Limited Direct Execution
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Tools for Investigating Graphics System Performance
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
Chapter 14 Chapter 14: Server Monitoring and Optimization.
Process Management. External View of the OS Hardware fork() CreateProcess() CreateThread() close() CloseHandle() sleep() semctl() signal() SetWaitableTimer()
OS Spring’03 Introduction Operating Systems Spring 2003.
Measuring Performance Chapter 12 CSE807. Performance Measurement To assist in guaranteeing Service Level Agreements For capacity planning For troubleshooting.
Chapter 6 Implementing Processes, Threads, and Resources.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 14 Server and Network Monitoring.
Operating Systems (CSCI2413) Lecture 3 Processes phones off (please)
7/13/20151 Topic 3: Run-Time Environment Memory Model Activation Record Call Convention Storage Allocation Runtime Stack and Heap Garbage Collection.
OS Spring’04 Introduction Operating Systems Spring 2004.
Slide 6-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 6.
Recursion and Implementation of Functions
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
Windows Server 2008 Chapter 11 Last Update
CSE 451: Operating Systems Autumn 2013 Module 6 Review of Processes, Kernel Threads, User-Level Threads Ed Lazowska 570 Allen.
Slide 6-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 6.
Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Silberschatz, Galvin and Gagne ©2007 Chapter 10: File-System Interface.
Visual C New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation.
A Portable Virtual Machine for Program Debugging and Directing Camil Demetrescu University of Rome “La Sapienza” Irene Finocchi University of Rome “Tor.
Multi-core Programming VTune Analyzer Basics. 2 Basics of VTune™ Performance Analyzer Topics What is the VTune™ Performance Analyzer? Performance tuning.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Threads and Processes.
Chapter 3 Process Description and Control
© 2008, Renesas Technology America, Inc., All Rights Reserved 1 Introduction Purpose  The course describes the performance analysis and profiling tools.
Windows 2000 Course Summary Computing Department, Lancaster University, UK.
Overview of CrayPat and Apprentice 2 Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
REVIEW OF COMMONLY USED DATA STRUCTURES IN OS. NEEDS FOR EFFICIENT DATA STRUCTURE Storage complexity & Computation complexity matter Consider the problem.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
CE Operating Systems Lecture 7 Threads & Introduction to CPU Scheduling.
Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.
Bit-DSP-MicrocontrollerTMS320F2812 Texas Instruments Incorporated European Customer Training Center University of Applied Sciences Zwickau (FH)
Processes CS 6560: Operating Systems Design. 2 Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Processes, Threads, and Process States. Programs and Processes  Program: an executable file (before/after compilation)  Process: an instance of a program.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
A Region-Based Compilation Technique for a Java Just-In-Time Compiler Toshio Suganuma, Toshiaki Yasue and Toshio Nakatani Presenter: Ioana Burcea.
C Programming Chapters 11, . . .
What is a Process ? A program in execution.
Projections - A Step by Step Tutorial By Chee Wai Lee For the 2004 Charm++ Workshop.
Improve Embedded System Stability and Performance through Memory Analysis Tools Bill Graham, Product Line Manager Development Tools November 14, 2006.
Chapter 6 Limited Direct Execution Chien-Chung Shen CIS/UD
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Embedded Real-Time Systems
Processes and threads.
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Chapter 3: Process Concept
Software Architecture in Practice
Operating Systems: A Modern Perspective, Chapter 6
What we need to be able to count to tune programs
Capriccio – A Thread Model
Operation System Program 4
A configurable binary instrumenter
PerfView Measure and Improve Your App’s Performance for Free
MECH 3550 : Simulation & Visualization
Dynamic Binary Translators and Instrumenters
Presentation transcript:

Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX

Agenda Introduction to application profiling Profiling techniques Visualization Application centric profiling System aware analysis Summary

Application profiling Application In QNX terms this is a OS process with one or more threads Application profiling Measuring function and, optionally, line of code execution time Visualizing the results Various techniques available with associated strengths and weaknesses Sampling Call count instrumentation Function instrumentation Kernel event tracing

Sampling – how it works int func2(int var) { int p = sqrt((double)var); return var - p*p; } void test(){ int var, sum; func1(); for (var = 10; var < 15; ++var) { sum+=func2(var); } printf("result=%d\n", sum); } test() func1() func2() printf() Instruction pointer samples Source Execution path Annotated source histogram

Sampling Technique Sampling gets rough estimate on where process is spending its time. Target agent periodically records target process instruction pointer (sample) IDE gathers all samples, aggregates them and presents to the user in the form of a table or annotated code Strengths No special compilation required for binary Very low overhead Per instruction granularity Weaknesses Gives reliable results only for long running applications Can give incorrect results for timer based applications (because sampling is timer based itself) It is not possible to find out where function was called from (mitigated by combining with call count instrumentation)

Call counts – how it works int func2(int var) { int p = sqrt((double)var); return var - p*p; } void test(){ int var, sum; func1(); for (var = 10; var < 15; ++var) { sum+=func2(var); } printf("result=%d\n", sum); } Source Call tree and graph test() func1() func2() printf() Execution path = 1 = 2 = 3 = 4 = 5 = 1

Call counting Technique Call counting provides precise call count of all functions and all function pairs for instrumented code IDE provides visualization for call graph and call counts Strengths Precise call count information Provides call pair information, aggregated as a call graph Relatively low overhead Can augment sampled profiling Weaknesses Requires instrumentation (special compiler and linker options) Provides no information for non-instrumented libraries Call pair information but not full stack frames

Function instrumentation – how it works Source Execution path _func_enter _func_exit High resolution function timings int func2(int var) { int p = sqrt((double)var); return var - p*p; } void test(){ int var, sum; func1(); for (var = 10; var < 15; ++var) { sum+=func2(var); } printf("result=%d\n", sum); } test() func1() func2() printf()

Function instrumentation Technique Function Instrumentation records precise function execution time and runtime call-graph Requires instrumenting binary with –finstrument-functions compiler option which provides hooks on entry and on exit of each function Supports all visualization modes: function table, threads tree, call graph, call tree, annotated editor Strengths Complete runtime call graph, including call counts and full depth stack- frames for each call. Precise function execution time (aggregated) Weaknesses Requires instrumentation (special compiler and linker options) Higher overhead (overhead is removed from data shown in IDE)

Kernel event tracing – how it works IDE Execution path _func_enter _func_exit test() func1() func2() printf()

Kernel event tracing Technique Visualization of kernel trace logging Strengths System-wide perspective of target behaviour Precise information on context switches Weaknesses Available for relatively small timeframe Higher overhead when capturing trace Requires instrumented kernel running on target

Visualization

The key to understanding your system and the problems you are trying to solve is visualization Providing alternative views on the same data provides insight: Call trees and graphs Comparison of results from different scenarios Filtering, searching, grouping Source traceability and source annotation

Function call trees and graphs Call tree – top down Reverse call tree – bottom up Call graph

Comparing results Compare previous profiling sessions Snapshot comparison

Grouping, search and filtering Search Group Filters

Source code annotation

Application centric profiling

Isolating performance issues Suspect performance problem with our application Find out per-process CPU usage using a process monitor

Quick peek using sampling Get an overview of CPU time consumption Attach to running process or launch with profiling Sampling is good for an overview but not detailed enough to find the actual problem So we need to instrument the binary and run it again.

Instrumented run Compile and run application with function instrumentation Top-bottom call tree shows how much time process spent in each node, starting from thread and going down to individual functions Expand node Observe what function calls, expand nodes with the bigger contribution first Drill down to particular function to see aggregated results

Instrumented run, cont. Traverse call tree to look for anomalies In this example, most of the time consumption from usage of memory allocation functions For example, optimize this code by heap memory allocation with static memory Re-run application and compare results

Compare results Recompile changes and re-run instrumented application Select old and new session and run “Compare session” Compare tree shows different between old and new session Tooltips show absolute values of old and new run

System aware analysis

System and application profiling Application centric view works for isolating local performance problems Algorithms CPU-intensive code Embedded, real-time applications never work in isolation Need to consider interacting processes Temporary client-server relationships are typical as processes request data and services from each other Process 1 Process 2 Device Traditionally, application profiling looks at processes in isolation

Example - client Investigation first leads to the client process We see a majority of the time spent in the getServerValue function Where is the CPU time really being spent?

Observe CPU usage

Example – client, server and device! Looking at system profile tells us that the server is using block I/O (EIDE disk driver) We can see as much detail as we need – down to individual interactions with the driver In this example, we can clearly see where time is spent in application versus driver

System awareness System profiling provides visualization of behavior among processes including devices and the OS Function instrumentation provides process and function details in system event trace Leverage the function call information with system awareness to understand what is really going on

Example - server Investigation leads to server process where most of the time spent is in the getProperty() function Eventually, I/O is required to satisfy the requests File Device Network

Summary

Various options for application profiling Sampling, call counts, function instrumentation Technique used depends on need Visualization tools are essential for analysis Problem solving and optimization requires system view Leverage system and application profiling is key

Thank you! Questions & Answers