INTROPERF: TRANSPARENT CONTEXT- SENSITIVE MULTI-LAYER PERFORMANCE INFERENCE USING SYSTEM STACK TRACES Chung Hwan Kim*, Junghwan Rhee, Hui Zhang, Nipun.

Slides:



Advertisements
Similar presentations
Processes and Threads Chapter 3 and 4 Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community College,
Advertisements

USCOPE: A SCALABLE UNIFIED TRACER FROM KERNEL TO USER SPACE Junghwan Rhee, Hui Zhang, Nipun Arora, Guofei Jiang, Kenji Yoshihira NEC Laboratories America.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT),
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
Automatic Misconfiguration Troubleshooting with PeerPressure Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang Microsoft Research Presenter:
Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.
Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.
Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.
An Analytics Approach to Traffic Analysis in Network Virtualization Hui Zhang, Junghwan Rhee, Nipun Arora, Qiang Xu, Cristian Lumezanu, Guofei Jiang
CLUE: SYSTEM TRACE ANALYTICS FOR CLOUD SERVICE PERFORMANCE DIAGNOSIS Hui Zhang 1, Junghwan Rhee 1, Nipun Arora 1, Sahan Gamage 2, Guofei Jiang 1, Kenji.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
1 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Extended Whole Program Paths Sriraman Tallam Rajiv Gupta Xiangyu Zhang University of Arizona.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.
A Comparison of Online and Dynamic Impact Analysis Algorithms Ben Breech Mike Tegtmeyer Lori Pollock University of Delaware.
Capriccio: Scalable Threads for Internet Services ( by Behren, Condit, Zhou, Necula, Brewer ) Presented by Alex Sherman and Sarita Bafna.
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Tools for Investigating Graphics System Performance
2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.
Previous finals up on the web page use them as practice problems look at them early.
EE694v-Verification-Lect5-1- Lecture 5 - Verification Tools Automation improves the efficiency and reliability of the verification process Some tools,
Automated Diagnosis of Software Configuration Errors
Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.
University of Michigan Electrical Engineering and Computer Science 1 Practical Lock/Unlock Pairing for Concurrent Programs Hyoun Kyu Cho 1, Yin Wang 2,
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
Supporting GPU Sharing in Cloud Environments with a Transparent
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Microsoft Research Asia Ming Wu, Haoxiang Lin, Xuezheng Liu, Zhenyu Guo, Huayang Guo, Lidong Zhou, Zheng Zhang MIT Fan Long, Xi Wang, Zhilei Xu.
Introduction Overview Static analysis Memory analysis Kernel integrity checking Implementation and evaluation Limitations and future work Conclusions.
A holistic Pre-to-Post solution for Post-Si validation of SoC’s Yael Abarbanel Eli Singerman
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
NetPilot: Automating Datacenter Network Failure Mitigation Xin Wu, Daniel Turner, Chao-Chih Chen, David A. Maltz, Xiaowei Yang, Lihua Yuan, Ming Zhang.
©NEC Laboratories America 1 Huadong Liu (U. of Tennessee) Hui Zhang, Rauf Izmailov, Guofei Jiang, Xiaoqiao Meng (NEC Labs America) Presented by: Hui Zhang.
Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology, Fall 2010 Performance.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
Processes and Threads CS550 Operating Systems. Processes and Threads These exist only at execution time They have fast state changes -> in memory and.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
Automatically Repairing Broken Workflows for Evolving GUI Applications Sai Zhang University of Washington Joint work with: Hao Lü, Michael D. Ernst.
Breadcrumbs: Efficient Context Sensitivity for Dynamic Bug Detection Analyses Michael D. Bond University of Texas at Austin Graham Z. Baker Tufts / MIT.
Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.
Hassen Grati, Houari Sahraoui, Pierre Poulin DIRO, Université de Montréal Extracting Sequence Diagrams from Execution Traces using Interactive Visualization.
Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
Full and Para Virtualization
Application Communities Phase II Technical Progress, Instrumentation, System Design, Plans March 10, 2009.
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
Dapper, a Large-Scale Distributed System Tracing Infrastructure
A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.
Sunpyo Hong, Hyesoon Kim
Application Communities Phase 2 (AC2) Project Overview Nov. 20, 2008 Greg Sullivan BAE Systems Advanced Information Technologies (AIT)
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
Advanced Operating Systems CS6025 Spring 2016 Processes and Threads (Chapter 2)
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
SketchVisor: Robust Network Measurement for Software Packet Processing
The Holmes Platform and Applications
iProbe: A Lightweight User- Space Instrumentation Tool
An Analytics Approach to Traffic Analysis in Network Virtualization
OpenMosix, Open SSI, and LinuxPMI
SQL Server Monitoring Overview
Progress Report Meeting
Capriccio – A Thread Model
PerfView Measure and Improve Your App’s Performance for Free
Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz
Presentation transcript:

INTROPERF: TRANSPARENT CONTEXT- SENSITIVE MULTI-LAYER PERFORMANCE INFERENCE USING SYSTEM STACK TRACES Chung Hwan Kim*, Junghwan Rhee, Hui Zhang, Nipun Arora, Guofei Jiang, Xiangyu Zhang*, Dongyan Xu* NEC Laboratories America *Purdue University and CERIAS

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Performance Bugs Performance bugs Software defects where relatively simple source-code changes can significantly speed up software, while preserving functionality [Jin et al., PLDI12]. Common issues in most software projects and these defects are hard to be optimized by compilers due to software logic. Many performance bugs escape the development stage and cause cost and inconvenience to software users. 2

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Diagnosis of Performance Bugs is Hard 3 Diverse root causes Input/workload Configuration Resource Bugs Others Performance overhead propagates. => Need performance analysis in a global scope! “Performance problems require understanding all system layers” -Hauswirth et al., OOPSLA ‘04 void main () {... do (input)... fwrite(input)... } void do (input) { while (...) { latency } int fwrite (input) { write (input) } User space Kernel space int write (input) { latency }

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Diagnosis of Performance Bugs Development stage Source code is available. Developers have knowledge on programs. Testing workload Heavy-weight tools such as profilers and dynamic binary instrumentation are often tolerable. Post-development stage Many users do not have source code. Third-party code and external modules come in binaries. Realistic workload at deployment Low overhead is required for diagnosis tools. Q: How to analyze performance bugs and find their root causes in a post-development stage with low overhead? 4

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces OS Tracers and System Stack Trace Many modern OSes provide tracing tools as swiss-army-tools These tools provide tracing of OS events. Examples: SystemTap, Dtrace, Microsoft ETW Advanced OS tracers provide stack traces. We call OS events + stack traces = system stack traces. Examples: Microsoft ETW, Dtrace Challenges Events occur on OS events. Missing application function latency: How do we know which program functions are slow? 5 System Stack Trace t1t1 t2t2 t3t3 t4t4 S1S1 S2S2 S3S3 S1S1 Time Stamp OS Event ABDABD ABDABD ACDACD ACDACD User Code Info. OS Kernel Trace App 1 App 2

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces IntroPerf IntroPerf: A diagnosis tool for Performance Introspection based on system stack traces Key Ideas Function latency inference based on the continuity of a calling context Context sensitive performance analysis 6 System Stack Traces System Stack Traces Function Latency Inference Performance- annotated Calling Context Ranking Performance- annotated Calling Context Ranking Dynamic Calling Context Indexing Top-down Latency Breakdown A Report of Performance Bugs Transparent Inference of Application Performance Context-sensitive Performance Analysis

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Inference of Function Latencies Inference based on the continuity of a function in the context Algorithm captures a period of a function execution in the call stack without a disruption of its context 7 Function Execution D D BC A t1t1 ABDABD t2t2 ABDABD t3t3 ACDACD A stack trace event Function lifetime t4t4 ACAC Call Return Conservative estimation

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Inference of Function Latencies Inference based on the continuity of a function in the context Algorithm captures a period of a function execution in the call stack without a disruption of its context 8 Function Execution D D BC A t1t1 ABDABD t2t2 ABDABD t3t3 ACDACD A stack trace event Function lifetime t4t4 ACAC Call Return Conservative estimation Yes ABDABD A (T1-T1) B (T1-T1) D (T1-T1) IsNewThisStack Register (Time) Captured Function Instances

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Inference of Function Latencies Inference based on the continuity of a function in the context Algorithm captures a period of a function execution in the call stack without a disruption of its context 9 Function Execution D D BC A t1t1 ABDABD t2t2 ABDABD t3t3 ACDACD A stack trace event Function lifetime t4t4 ACAC Call Return Conservative estimation No ABDABD A (T1-T2) B (T1-T2) D (T1-T2) Captured Function Instances IsNewThisStack Register (Time)

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Inference of Function Latencies Inference based on the continuity of a function in the context Algorithm captures a period of a function execution in the call stack without a disruption of its context 10 Function Execution D D BC A t1t1 ABDABD t2t2 ABDABD t3t3 ACDACD A stack trace event Function lifetime t4t4 ACAC Call Return Conservative estimation No Yes ACDACD A (T1-T3) C (T3-T3) D (T3-T3) B (T1-T2) D (T1-T2) Captured Function Instances IsNewThisStack Register (Time)

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Inference of Function Latencies Inference based on the continuity of a function in the context Algorithm captures a period of a function execution in the call stack without a disruption of its context 11 Function Execution D D BC A t1t1 ABDABD t2t2 ABDABD t3t3 ACDACD A stack trace event Function lifetime t4t4 ACAC Call Return Conservative estimation No ACAC A (T1-T4) C (T3-T4) B (T1-T2) D (T1-T2) D (T3-T3) Captured Function Instances IsNewThisStack Register (Time)

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Inference of Function Latencies Inference based on the continuity of a function in the context Algorithm captures a period of a function execution in the call stack without a disruption of its context 12 Function Execution D D BC A t1t1 ABDABD t2t2 ABDABD t3t3 ACDACD A stack trace event Function lifetime t4t4 ACAC Call Return Conservative estimation A (T1-T4) C (T3-T4) B (T1-T2) D (T1-T2) D (T3-T3) Captured Function Instances IsNewThisStack Register (Time)

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Dynamic Calling Context Tree A calling context is a distinct order of a function call sequence starting from the “main” function (i.e., a call path). We use calling context tree as the model of application performance to organize inferred latency in a structured way. Unique and concise index of a dynamic context is necessary for analysis. Adopted a variant of the calling context tree data structure [Ammons97]. Assign a unique number of the pointer to the end of each path. 13 IndexPath 1 2 Dynamic Calling Context Tree root A BC D D t1t1 ABDABD t2t2 ABDABD t3t3 ACDACD t4t4 ACAC Stack Traces

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Performance-annotated Calling Context Tree Top-down Latency Normalization Inference of latency performed in all layers of the stack causes overlaps of latencies in multiple layers. Latency is normalized by recursively subtracting children functions’ latencies in the calling context tree. Performance-annotated Calling Context Tree Calling context tree is extended by annotating normalized inferred performance latencies in calling context tree. 14 B A D Call Return Call Return Call Return D Call Return C Dynamic Calling Context Tree root A BC D D

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Context-sensitive Performance Analysis Context-aware performance analysis involves diverse states of programs because of context-sensitive function call behavior. Manual analysis will consume significant time and efforts of users. Ranking of function call paths with latency allows us to focus on the sources of performance bug symptoms. 15

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Ranking Calling Contexts and Functions We calculate the cost of each calling context (i.e., call path from the root) by storing the inferred function latencies. The top N ranked calling contexts regarding latency (i.e., hot calling contexts) are listed for evaluation. Furthermore, for each hot calling context, function nodes are ranked regarding their latencies and hot functions inside the path are determined. Top rank context Lower rank context Low level system layer (e.g., syscall) High level application function (e.g., main) Low level system layer (e.g., syscall) High level application function (e.g., main) Top rank context Lower rank context 16

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Implementation IntroPerf is built on top of a production tracer, Event Tracing Framework for Windows (ETW). We use the stack traces generated on system calls and context switch events. Parser of ETW events and performance analyzer 42K lines of Windows code in Visual C++ Experiment machine Intel Core i GHz CPU 8GB RAM Windows Server 2008 R2 17

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Evaluation Q1: How effective is IntroPerf at diagnosing performance bugs? Q2: What is the coverage of program execution captured by system stack traces? Q3: What is the runtime overhead of IntroPerf? 18

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Evaluation – Performance Bugs Q1: How effective is IntroPerf at diagnosing performance bugs? Ranking of calling contexts and function instances allows developers to understand “where” and “how” performance bugs occur and determine the suitable code to be fixed. Evaluation Setup Server programs (Apache, MySQL), desktop software (7zip), system utilities (ProcessHacker similar to the task manager) Reproduced the cases of performance bugs. The ground truth of root causes are the patched functions. Bug injection cases. The root causes are the injected functions. Two criteria depending on the locations of the bugs Internal bugs and external bugs 19

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Evaluation – Performance Bugs Internal Bugs Performance bugs inside the main binary 20

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Evaluation – Performance Bugs Internal Bugs Performance bugs inside the main binary 21 MySQL Top rank context Lower rank context Low level system layer (e.g., system call) High level application function (e.g., main)

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Evaluation – Performance Bugs Internal Bugs Performance bugs inside the main binary 22 MySQL Top rank context Lower rank context Low level system layer (e.g., system call) High level application function (e.g., main) Most costly function in a path p min f min

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Evaluation – Performance Bugs Internal Bugs Performance bugs inside the main binary External Bugs Performance bugs outside the main binary 23

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Evaluation – Performance Bugs Internal Bugs Performance bugs inside the main binary External Bugs Performance bugs outside the main binary 24

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Evaluation – Performance Bugs Summary : The root causes of all our evaluation cases are caught in the top 11 costly calling contexts. The distance between costly functions and the patched functions differs depending on the types of bugs and application semantics. IntroPerf assists the patching process by presenting top ranked costly calling contexts and functions. 25

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Evaluation – Coverage Q2: What is the coverage of program execution captured by system stack traces? We measured how much dynamic program state is covered by stack traces in two criteria: dynamic calling contexts, function call instances We used a dynamic program instrumentation tool, Pin, to track all function calls, returns, and system calls and obtain the ground truth. Context switch events are simulated based on a reference to scheduling policies of Windows systems [Buchanan97]. Three configurations are used for evaluation. 1. System calls 2. System calls with a low rate context switch events (120ms) 3. System calls with a high rate context switch events (20ms) 26

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Evaluation – Coverage Coverage analysis of three applications: Apache, MySQL, and 7zip System call rate: 0.33~2.78% for Apache, 0.21~1.48% for MySQL, 0.11~5.03% for 7zip Coverage for all: Calling contexts: 5.3~49.4% Function instances: 0.6~31.2% Coverage for top 1% slowest functions: Calling contexts : 34.7~100% Function instances : 16.6~100% Summary: There is a significantly high chance to capture high latency functions which are important for performance diagnosis. 27

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Evaluation - Performance Q3: What is the runtime overhead of IntroPerf? Evaluation of Windows ETW’s performance for generating stack traces of three applications: Apache, MySQL, and 7zip Tracing overhead Stack traces on system calls: 1.37~8.2% Stack traces on system calls and context switch events: 2.4~9.11% Reasonable to be used in a post-development stage 28

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Conclusion IntroPerf provides a transparent performance introspection technique based on the inference of function latencies from system stack traces. We evaluated IntroPerf on a set of widely used open source software and automatically found the root causes of real world performance bugs and delay-injected cases. The results show the effectiveness and practicality of IntroPerf as a lightweight performance diagnosis tool in a post-development stage. 29

IntroPerf: Transparent Context-Sensitive Multi-layer Performance Inference using System Stack Traces Thank you 30