Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.

Slides:



Advertisements
Similar presentations
Models of Computation Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms Week 1, Lecture 2.
Advertisements

Introduction to Recursion and Recursive Algorithms
SSA and CPS CS153: Compilers Greg Morrisett. Monadic Form vs CFGs Consider CFG available exp. analysis: statement gen's kill's x:=v 1 p v 2 x:=v 1 p v.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
INTROPERF: TRANSPARENT CONTEXT- SENSITIVE MULTI-LAYER PERFORMANCE INFERENCE USING SYSTEM STACK TRACES Chung Hwan Kim*, Junghwan Rhee, Hui Zhang, Nipun.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.
Program Representations. Representing programs Goals.
GRAPHS Trees Without Rules. Graph  A data structure that consists of a set of nodes (called vertices) and a set of edges that relate the nodes to each.
1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.
 Introduction to Programming History of programming.
Bug Isolation via Remote Program Sampling Ben Liblit, Alex Aiken, Alice X.Zheng, Michael I.Jordan Presented by: Xia Cheng.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Interprocedural analyses and optimizations. Costs of procedure calls Up until now, we treated calls conservatively: –make the flow function for call nodes.
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.
Previous finals up on the web page use them as practice problems look at them early.
CS 188: Artificial Intelligence Spring 2006 Lecture 2: Queue-Based Search 8/31/2006 Dan Klein – UC Berkeley Many slides over the course adapted from either.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
Bug Isolation via Remote Program Sampling Ben LiblitAlex Aiken Alice ZhengMike Jordan.
© 2008 IBM Corporation Behavioral Models for Software Development Andrei Kirshin, Dolev Dotan, Alan Hartman January 2008.
Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.
Microsoft ® Official Course Monitoring and Troubleshooting Custom SharePoint Solutions SharePoint Practice Microsoft SharePoint 2013.
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
CMSC 345 Fall 2000 Unit Testing. The testing process.
Scalable Statistical Bug Isolation Ben Liblit, Mayur Naik, Alice Zheng, Alex Aiken, and Michael Jordan University of Wisconsin, Stanford University, and.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Richard Johnson  How can we use the visualization tools we currently have more effectively?  How can the Software Development.
1 CO Games Development 1 Week 6 Introduction To Pathfinding + Crash and Turn + Breadth-first Search Gareth Bellaby.
Bug Localization with Machine Learning Techniques Wujie Zheng
Chapter 5: Programming Languages and Constructs by Ravi Sethi Activation Records Dolores Zage.
Introduction to Software Testing Chapter 8.1 Building Testing Tools –Instrumentation Paul Ammann & Jeff Offutt
Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Automatically Repairing Broken Workflows for Evolving GUI Applications Sai Zhang University of Washington Joint work with: Hao Lü, Michael D. Ernst.
COMP3190: Principle of Programming Languages
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
CASE/Re-factoring and program slicing
An Undergraduate Course on Software Bug Detection Tools and Techniques Eric Larson Seattle University March 3, 2006.
“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.
Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.
Copyright © Curt Hill The IF Revisited If part 4 Style and Testing.
Scientific Debugging. Errors in Software Errors are unexpected behaviors or outputs in programs As long as software is developed by humans, it will contain.
Reachability Analysis for Callbacks 北京大学 唐浩
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
Automated Adaptive Bug Isolation using Dyninst Piramanayagam Arumuga Nainar, Prof. Ben Liblit University of Wisconsin-Madison.
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
Bug Isolation via Remote Program Sampling Ben LiblitAlex Aiken Alice X. ZhengMichael I. Jordan UC Berkeley.
Optimistic Hybrid Analysis
“<Fill in your definition here.>”
Code Optimization.
Software Testing.
Testing and Debugging PPT By :Dr. R. Mall.
Unified Modeling Language
Types of Testing Visit to more Learning Resources.
CSCI1600: Embedded and Real Time Software
Sampling User Executions for Bug Isolation
Public Deployment of Cooperative Bug Isolation
Objective of This Course
Searching for Solutions
Lectures on Graph Algorithms: searching, testing and sorting
Execution Indexing Xiangyu Zhang.
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Trace-based Just-in-Time Type Specialization for Dynamic Languages
Static Single Assignment
Min Heap Update E.g. remove smallest item 1. Pop off top (smallest) 3
COMP60621 Designing for Parallelism
CSCI1600: Embedded and Real Time Software
Dynamic Binary Translators and Instrumenters
Presentation transcript:

Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken

A Few Grim Realities Programs fail post-deployment –Ship with known bugs –Users discover new bugs Users are lousy testers –Never do the same thing twice –Wild variation in execution environment –Poor bug reporting, if any Users’ bugs are the ones that really matter

Program Analysis for Pessimists Assume & prepare for postmortem analysis –Compile-time analysis, stashed away for later –Lightweight (deployable) instrumentation Analyze failed program instances –Mix of automated / interactive tools –Not quite static analysis, not quite dynamic Help humans find and fix bugs that matter

This Talk: Reconstructing Execution Chronologies Control flow decision history captures important properties Fundamental questions –“How in the world did it get here?” –“What happened just before this point?” –“How can I make this happen again?” Broader interest than just crashes

Striking a Compromise Heavyweight approaches –Replay debugging –Program tracing Lightweight approaches –Examine stack trace in debugger –printf() debugging Middleweight (our) approach –“How might we have gotten here, given …?”

Striking a Compromise Heavyweight approaches –Replay debugging –Program tracing Lightweight approaches –Examine stack trace in debugger –printf() debugging Middleweight (our) approach –“How might we have gotten here, given …?”

Striking a Compromise Heavyweight approaches –Replay debugging –Program tracing Lightweight approaches –Examine stack trace in debugger –printf() debugging Middleweight (our) approach –“How might we have gotten here, given …?”

Striking a Compromise Heavyweight approaches –Replay debugging –Program tracing Lightweight approaches –Examine stack trace in debugger –printf() debugging Middleweight (our) approach –“How might we have gotten here, given …?”

The Big Idea: “Gotten Here” is Control Flow Reachability

The Big Idea: “Gotten Here” is Control Flow Reachability  

The Big Idea: “Gotten Here” is Control Flow Reachability Interested in paths –“How”, not just “yes/no” Transitive paths within one function Multiple functions? –Matched call/return paths –This is a form of context free language reachability  ?  ?

() [ ] Global Control Flow Graph callreturn entryexit callreturn

Variations in Matching Grammar Complete execution –All calls & returns must be matched {()(){()}[{}(())]}

Variations in Matching Grammar Aborted execution –Some calls without returns –We use a variant of this {()(){()}[{}(())]}

CFL Reachability Algorithm Similar to transitive graph search –Use a work list to incrementally extend frontier –Forward from α or backward from ω –Transitively adding flow edges is one case Several additional cases for calls/returns Complexity –O(N 3 ) for arbitrary grammar and graph –O(E) for our analyses (and many others)

Reconstruction With Crash Site Only Work backward from crash site Remember why each path is extended –Record justifications in route map –route(x, z) = { r 1, …, r n } –r i = cross from x to y, then see route(y, z) x and y must be “adjacent”: one of four cases route(α, ω) defines possible chronologies

Reconstruction With Crash Site Only One case, unmatched call, determines stack (

Reconstruction With Crash Site Only One case, unmatched call, determines stack –Unmatched parens: {()(){()}[{}(())]} –Stack trace: {[( (

Reconstruction With Crash Site Only One case, unmatched call, determines stack –Unmatched parens: {()(){()}[{}(())]} –Stack trace: {[( But we probably have a specific stack trace in mind… (

Reconstruction With Crash Site + Stack Trace S ::= vector of call edges Build |S + 1| clones of global flow graph

Reconstruction With Crash Site + Stack Trace S ::= vector of call edges Build |S + 1| clones of global flow graph Two types of call edge –( i must match ) i Stays on same layer

Reconstruction With Crash Site + Stack Trace S ::= vector of call edges Build |S + 1| clones of global flow graph Two types of call edge –( i must match ) i Stays on same layer –c i must be unmatched Only way to next layer Determined by S c6c6 c3c3 c 14

Reconstruction With Crash Site + Stack Trace Possible histories –Start at α on top layer –End at ω on bottom layer –route(  α, 0 ,  ω, |S|  ) Backward, not forward –More deterministic Complexity –O(E) work, |S + 1| times c6c6 c3c3 c 14

Reconstruction With Crash Site + Event Trace V ::= vector of trace nodes Use |V + 1| layered clones, as before Must report event when crossing trace node –On each layer, knock out all trace nodes but one On bottommost layer, no trace nodes at all! –Further restricts set of possible paths Complexity: O(E|V|)

Reconstruction With … Stack trace + event trace Multiple event traces Ambiguous traces Incomplete event trace –Recent-branch registers Program counter sampling Finite state machine of your choosing…

Practical Considerations Dynamic dispatch / function pointers –Usual static techniques (points-to, receiver-class, etc.) –Event tracing can help –Note: stack trace is never dynamic Interactivity –Backward analysis is best: most bugs are close to crash –FIFO work list, demand-driven search –Deterministic versus non-deterministic state machines

Areas For Future Exploration Sparsity of trace information –Identify state-preserving regions –Explore such regions only once Summarization / visualization –Basis: dominator tree walk-back –Opportunity for novel algorithms here

Areas For Future Exploration Adaptive Gap Reduction –Programmer inquiries guide future annotation “Which way did this branch really go?” “How many times did this loop really execute?” –Identification of key inflection points –Insert lightweight event tracing nodes Related work in efficient path profiling –More evidence for future reconstructions

Summary and Conclusions Program analysis in an imperfect world –Post-crash: unique challenges / leverage points CFL path recovery as basis for analysis –Efficient, demand-driven, adaptable Future work –Adaptive annotation to fill in gaps –Leveraging multiple runs –Data value modeling