This One Time, at PL Camp... Summer School on Language-Based Techniques for Integrating with the External World University of Oregon Eugene, Oregon July.

Slides:

Advertisements

Similar presentations

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.

Advertisements

Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.

1 Bug Isolation via Remote Program Sampling Ben LiblitAlex Aiken Alice X. ZhengMichael Jordan Presented By : Arpita Gandhi.

Garbage Collecting the World Bernard Lang Christian Queinnec Jose Piquer Presented by Yu-Jin Chia See also: pp text.

Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.

Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

Programming Types of Testing.

Statistical Debugging Ben Liblit, University of Wisconsin–Madison.

Statistical Debugging Ben Liblit, University of Wisconsin–Madison.

Bug Isolation via Remote Program Sampling Ben Liblit, Alex Aiken, Alice X.Zheng, Michael I.Jordan Presented by: Xia Cheng.

CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.

(Quickly) Testing the Tester via Path Coverage Alex Groce Oregon State University (formerly NASA/JPL Laboratory for Reliable Software)

Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University.

CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.

Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.

Soft. Eng. II, Spr. 2002Dr Driss Kettani, from I. Sommerville1 CSC-3325: Chapter 9 Title : Reliability Reading: I. Sommerville, Chap. 16, 17 and 18.

Testing an individual module

Bug Isolation via Remote Program Sampling Ben LiblitAlex Aiken Alice ZhengMike Jordan.

Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.

1 Joe Meehean. 2 Testing is the process of executing a program with the intent of finding errors. -Glenford Myers.

CSCI 5801: Software Engineering

Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.

Language Evaluation Criteria

University of Palestine software engineering department Testing of Software Systems Fundamentals of testing instructor: Tasneem Darwish.

CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.

CMSC 345 Fall 2000 Unit Testing. The testing process.

Scalable Statistical Bug Isolation Ben Liblit, Mayur Naik, Alice Zheng, Alex Aiken, and Michael Jordan, 2005 University of Wisconsin, Stanford University,

Scalable Statistical Bug Isolation Ben Liblit, Mayur Naik, Alice Zheng, Alex Aiken, and Michael Jordan University of Wisconsin, Stanford University, and.

Bug Localization with Machine Learning Techniques Wujie Zheng

Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.

Scalable Statistical Bug Isolation Authors: B. Liblit, M. Naik, A.X. Zheng, A. Aiken, M. I. Jordan Presented by S. Li.

Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.

DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.

Unit Testing 101 Black Box v. White Box. Definition of V&V Verification - is the product correct Validation - is it the correct product.

Chapter 22 Developer testing Peter J. Lane. Testing can be difficult for developers to follow  Testing’s goal runs counter to the goals of the other.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.

Week 9 Data structures / collections. Vladimir Misic Week 9 Monday, 4:20:52 PM2 Data structures (informally:) By size: –Static (e.g. arrays)

Today’s Agenda  Reminder: HW #1 Due next class  Quick Review  Input Space Partitioning Software Testing and Maintenance 1.

What is Testing? Testing is the process of finding errors in the system implementation. –The intent of testing is to find problems with the system.

11/26/2015IT 3271 Memory Management (Ch 14) n Dynamic memory allocation Language systems provide an important hidden player: Runtime memory manager – Activation.

The Software Development Process

Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.

Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.

Bug Isolation via Remote Sampling. Lemonade from Lemons Bugs manifest themselves every where in deployed systems. Each manifestation gives us the chance.

Scientific Debugging. Errors in Software Errors are unexpected behaviors or outputs in programs As long as software is developed by humans, it will contain.

Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.

ANU COMP2110 Software Design in 2003 Lecture 10Slide 1 COMP2110 Software Design in 2004 Lecture 12 Documenting Detailed Design How to write down detailed.

CSC 480 Software Engineering Test Planning. Test Cases and Test Plans A test case is an explicit set of instructions designed to detect a particular class.

CS451 Lecture 10: Software Testing Yugi Lee STB #555 (816)

Testing CSE 160 University of Washington 1. Testing Programming to analyze data is powerful It’s useless (or worse!) if the results are not correct Correctness.

1 Program Development  The creation of software involves four basic activities: establishing the requirements creating a design implementing the code.

Software Quality Assurance and Testing Fazal Rehman Shamil.

1. Black Box Testing  Black box testing is also called functional testing  Black box testing ignores the internal mechanism of a system or component.

Dynamic Testing.

Statistical Debugging CS Motivation Bugs will escape in-house testing and analysis tools –Dynamic analysis (i.e. testing) is unsound –Static analysis.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.

Cooperative Bug Isolation CS Outline Something different today... Look at monitoring deployed code –Collecting information from actual user runs.

Automated Adaptive Bug Isolation using Dyninst Piramanayagam Arumuga Nainar, Prof. Ben Liblit University of Wisconsin-Madison.

Programming in Java (COP 2250) Lecture 12 & 13 Chengyong Yang Fall, 2005.

LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.

Defect testing Testing programs to establish the presence of system defects.

Bug Isolation via Remote Program Sampling Ben LiblitAlex Aiken Alice X. ZhengMichael I. Jordan UC Berkeley.

Input Space Partition Testing CS 4501 / 6501 Software Testing

Chapter 8 – Software Testing

Chapter 18 Software Testing Strategies

Sampling User Executions for Bug Isolation

Public Deployment of Cooperative Bug Isolation

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

Presentation transcript:

This One Time, at PL Camp... Summer School on Language-Based Techniques for Integrating with the External World University of Oregon Eugene, Oregon July 2007

Checking Type Safety of Foreign Function Calls Jeff Foster University of Maryland  Ensure type safety across languages OCaml/JNI – C  Multi-lingual type inference system Representational types  SAFFIRE Multi-lingual type inference system

Dangers of FFIs  In most FFIs, programmers write “glue code” Translates data between host and foreign languages Typically written in one of the languages  Unfortunately, FFIs are often easy to misuse Little or no checking done at language boundary Mistakes can silently corrupt memory One solution: interface generators

Example: “Pattern Matching” if (Is_long(x)) { if (Int_val(x) == 0) /* B */... if (Int_val(x) == 1) /* D */... } else { if (Tag_val(x) == 0) /* A */ Field(x, 0) = Val_int(0) if (Tag_val(x) == 1) /* C */ Field(x, 1) = Val_int(0) } type t = A of int | B | C of int * int | D

Garbage Collection  C FFI functions need to play nice with the GC Pointers from C to the OCaml heap must be registered value bar(value list) { CAMLparam1(list); CAMLlocal1(temp); temp = alloc_tuple(2); CAMLreturn(Val_unit); }  Easy to forget  Difficult to find this error with testing

Multi-Lingual Types  Representational Types Embed OCaml types in C types and vice versa

SAFFIRE  Static Analysis of Foreign Function InteRfacEs

Programming Models for Distributed Computing Yannis Smaragdakis University of Oregon  NRMI: Natural programming model for distributed computing.  J-Orchestra: Execute unsuspecting programs over a network, using program rewriting.  Morphing: High-level language facility for safe program transformation.

NRMI  Identify all reachable t alias1 alias tree Client sideServer side Network

NRMI  Execute remote procedure t alias1 alias tree Client sideServer side Network 2 tmp

NRMI  Send back all reachable t alias1 alias tree Client side Network 2

NRMI  Match reachable maps t alias1 alias tree Network 2

NRMI  Update original objects t alias1 alias tree Network 2

NRMI  Adjust links out of original objects t alias1 alias tree Network 2

NRMI  Adjust links out of new objects t alias1 alias tree Network 2

NRMI  Garbage collect t alias1 alias2 Network 2

J-Orchestra  Automatic partition system  Works as bytecode compiler lots of indirection using proxies, interfaces, local and remote objects  Partitioned program equivalent to original

Morphing  Ensure program generators are safe  Statically check the generator to determine the safety of any generated program, under All inputs ensure that genrated programs compile  Early approach – SafeGen Using theorem provers  MJ Using types

Fault Tolerant Computing David August and David Walker Princeton University  Processors are becoming more susceptible to intermittent faults. Moore’s Law, radiation Alter computation or state, resulting in incorrect program execution.  Goal: Build reliable systems from unreliable components.

Topics  Transient faults and mechanisms designed to protect against them (HW).  The role of languages and compilers may play in creating radiation hardened programs.  New opportunities made possible by languages which embrace potentially incorrect behavior.

Causes

Software/Compiler  Duplicate instructions and check at important locations (store) [SWIFT, EDDI]

λ zap  λ calculus with fault tolerance Intermediate language for compilers Models single fault Based on replication  Semantics model type of faults let x = 2 in let y = x + x in out y let x 1 = 2 in let x 2 = 2 in let x 3 = 2 in let y 1 = x 1 + x 1 in let y 2 = x 2 + x 2 in let y 3 = x 3 + x 3 in out [y 1,y 2,y 3 ]

Testing

Typing Ad Hoc Data Kathleen Fisher AT&T Labs  PADS project * Data Description Language (DDL) Data Description Calculus (DDC) Automatic inference of PADS descriptions *

PADS  Declarative description of data source: Physical format information Semantic constraints type responseCode = { x : Int | 99 < x < 600} Pstruct webRecord { Pip ip; " - - ["; Pdate(’:’) date; ":"; Ptime(’]’) time; "]"; httpMeth meth; " "; Puint8 code; " "; Puint8 size; " "; }; Parray webLog { webRecord[] };

Raw Data ASCII log files Binary Traces struct { } Data Description XML CSV Standard formats & schema; Visual Information End-user tools Learning  Problem: Producing useful tools for ad hoc data takes a lot of time.  Solution: A learning system to generate data descriptions and tools automatically.

Format Inference Engine Chunked Data Format Refinement Tokenization Structure Discovery Scoring Function IR to PADS Printer PADS Description Input File(s)

Multi-Staged Programming Walid Taha Rice University  Writing generic program that do not pay a runtime overhead. Use program generators Ensure syntactic well-formed, well-typed  MetaOCaml

The Abstract View P2P2 P1P1 I1I1 P Batch I2I2 I2I2 I2I2

MetaOCaml  Brackets (..) delay execution of an expression  Escape (.~ ) Combine smaller delayed values to construct larger ones  Run (.! ) Compile and execute the dynamically generated code

Power Example let rec power (n, x) = match n with 0 → 1 | n → x * (power (n-1, x));; let power2 (x) = power (2, x);; let power2 = fun x → power (2, x);; let power2 (x) = 1*x*x; let rec power (n, x) = match n with 0 →.. | n →..;; let power2 =.!..))>.;;

Scalable Defect Detection Manuvir Das, Daniel Wang, Zhe Yang, Microsoft Research  Program analysis at Microsoft scale scalability, accuracy  Combination of weak global analysis and slow local one (for some regions of code)  Programmers are requires to add interface annotations some automatic inference is available

Web and Database Application Security Zhendong Su University of California-Davis  Static analyses for enforcing correctness of dynamically generated database queries.  Runtime checking mechanisms for detecting SQL injection attacks;  Static analyses for detecting SQL injection and cross-site scripting vulnerabilities.

XML and Web Application Programming Anders Møller University of Aarhus  Formal models of XML schemas Expressiveness of DTD, XML Schema, Relax NG  Type checking XML transformation languages “Assuming that X is valid according to S in is T(x) valid according to S out ?”  Web application frameworks Java Servlets and JSP, JWIG, GWT

Types for Safe C-Level Programming Dan Grossman University of Washington  Cyclone, a safe dialect for C Designed to prevent safety violations (buffer overflow, memory management, …)  Mostly underlying theory Types, expression, memory regions

Analyzing and Debugging Software  Understanding Multilingual Software [Foster] Parlez vous OCaml?  Statistical Debugging [Liblit] you are my beta tester, and there’s lots of you  Scalable Defect Detection [Das, Wang, Yang] Microsoft programs have no bugs

Programming Models  Types for Safe C-Level Programming [Grossman] C without the ick factor  Staged Programming [Taha] Programs that produce programs that produce programs...  Prog. Modles for Dist. Comp. [Smaragdakis] We’ve secretly replaced your centralized program with a distributed application. can you tell the difference?

The Web  Web and Database Application Security [Su] How not to be pwn3d by 1337 haxxors  XML and Web Application Programming [Møller] X is worth 8 points in scrabble...let’s use it a lot

Other Really Important Stuff  Fault Tolerant Computing [August, Walker] Help, I’ve been hit by a cosmic ray!  Typing Ad Hoc Data [Fisher] Data, data, everywhere, but what does it mean?

Statistical Debugging Ben Liblit University Of Wisconsin-Madison

 Statistical Debugging & Cooperative Bug Isolation Observe deployed software in the hands of real end users Build statistical models of success & failure Guide programmers to the root causes of bugs Make software suck less What’s This All About?

Motivation “There are no significant bugs in our released software that any significant number of users want fixed.” Bill Gates, quoted in FOCUS Magazine

Software Releases in the Real World [Disclaimer: this may be a caricature.]

Software Releases in the Real World 1.Coders & testers in tight feedback loop Detailed monitoring, high repeatability Testing approximates reality 2.Testers & management declare “Ship it!” Perfection is not an option Developers don’t decide when to ship

Software Releases in the Real World 3.Everyone goes on vacation Congratulate yourselves on a job well done! What could possibly go wrong? 4.Upon return, hide from tech support Much can go wrong, and you know it Users define reality, and it’s not pretty –Where “not pretty” means “badly approximated by testing”

Testing as Approximation of Reality  Microsoft’s Watson error reporting system Crash report from 500,000 separate programs x % of software causes 50% of bugs Care to guess what x is?  1% of software errors causes 50% of user crashes  Small mismatch ➙ big problems (sometime)  Big mismatch ➙ small problem? (sometime!) Perfection is not an economically viable option

Real Engineers Measure Things; Are Software Engineers Real Engineers?

“The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong, it usually turns out to be impossible to get at or repair.” Instrumentation Framework Douglas Adams, Mostly Harmless

Bug Isolation Architecture Program Source Compiler Shipping Application Sampler Predicates Counts & /  Statistical Debugging Top bugs with likely causes

 Each behavior is expressed as a predicate P on program state at a particular program point.  Count how often “P observed true” and “P observed” using sparse but fair random samples of complete behavior. Model of Behavior

Program Source Compiler Shipping Application Sampler Predicates Counts & /  Statistical Debugging Top bugs with likely causes Predicate Injection: Guessing What’s Interesting

Branch Predicates Are Interesting if (p) … else …

if (p) // p was true (nonzero) else // p was false (zero)  Syntax yields instrumentation site  Site yields predicates on program behavior  Exactly one predicate true per visit to site Branch Predicate Counts

Returned Values Are Interesting n = fprintf(…);  Did you know that fprintf() returns a value?  Do you know what the return value means?  Do you remember to check it?

n = fprintf(…); // return value 0 ?  Syntax yields instrumentation site  Site yields predicates on program behavior  Exactly one predicate true per visit to site Returned Value Predicate Counts

Pair Relationships Are Interesting int i, j, k; … i = …;

Pair Relationship Predicate Counts int i, j, k; … i = …; // compare new value of i with… //other vars: j, k, … //old value of i //“important” constants

Many Other Behaviors of Interest  Assert statements Perhaps automatically introduced, e.g. by CCured  Unusual floating point values Did you know there are nine kinds?  Coverage of modules, functions, basic blocks, …  Reference counts: negative, zero, positive, invalid  Kinds of pointer: stack, heap, null, …  Temporal relationships: x before/after y  More ideas? Toss them all into the mix!

 Observation stream  observation count How often is each predicate observed true? Removes time dimension, for good or ill  Bump exactly one counter per observation Infer additional predicates (e.g. ≤, ≠, ≥) offline  Feedback report is: 1.Vector of predicate counters 2.Success/failure outcome label  Still quite a lot to measure What about performance? Summarization and Reporting

Program Source Compiler Shipping Application Sampler Predicates Counts & /  Statistical Debugging Top bugs with likely causes Fair Sampling Transformation

Sampling the Bernoulli Way  Decide to examine or ignore each site… Randomly Independently Dynamically  Cannot be periodic: unfair temporal aliasing  Cannot toss coin at each site: too slow

Amortized Coin Tossing  Randomized global countdown Small countdown  upcoming sample  Selected from geometric distribution Inter-arrival time for biased coin toss How many tails before next head? Mean sampling rate is tunable parameter

Geometric Distribution  D= mean of distribution = expected sample density

Weighing Acyclic Regions  Break CFG into acyclic regions  Each region has: Finite number of paths Finite max number of instrumentation sites  Compute max weight in bottom-up pass

Weighing Acyclic Regions  Clone acyclic regions “Fast” variant “Slow” variant  Choose at run time  Retain decrements on fast path for now Stay tuned… >4?

Path Balancing Optimization  Decrements on fast path are a bummer Goal: batch them up But some paths are shorter than others  Idea: add extra “ghost” instrumentation sites Pad out shorter paths All paths now equal

Path Balancing Optimization  Fast path is faster One bulk counter decrement on entry Instrumentation sites have no code at all  Slow path is slower More decrements  Consume more randomness

Optimizations  Identify and ignore “weightless” functions / cycles  Cache global countdown in local variable  Avoid cloning  Static branch prediction at region heads  Partition sites among several binaries  Many additional possibilities…

What Does This Give Us?  Absolutely certain of what we do see Subset of dynamic behavior Success/failure label for entire run  Uncertain of what we don’t see  Given enough runs, samples ≈ reality Common events seen most often Rare events seen at proportionate rate

Playing the Numbers Game Program Source Compiler Shipping Application Sampler Predicates Counts & /  Statistical Debugging Top bugs with likely causes

Isolating a Deterministic Bug  Hunt for crashing bug in ccrypt-1.2  Sample function return values Triple of counters per call site: 0  Use process of elimination Look for predicates true on some bad runs, but never true on any good run

Elimination Strategies  Universal Falsehood Disregard P if |P| = 0 for all runs Likely a predicate that can never be true  Lack of failing coverage All predicates for S is |S|=0 for all failed runs Site not reached in failing executions  Lack of failing example |P|=0 for all failed executions Need not be true for a failure to occur  Successful counterexample |P|>0 on at least one successful run Can be true without causing failure

Winnowing Down the Culprits  1710 counters 3 × 570 call sites  1569 zero on all runs 141 remain  139 nonzero on at least one successful run  Not much left! file_exists() > 0 xreadline() == 0

Multiple, Non-Deterministic Bugs  Strict process of elimination won’t work Can’t assume program will crash when it should No single common characteristic of all failures  Look for general correlation, not perfect prediction Warning! Statistics ahead!

Ranked Predicate Selection  Consider each predicate P one at a time Include inferred predicates (e.g. ≤, ≠, ≥)  How likely is failure when P is true? (technically, when P is observed to be true)  Multiple bugs yield multiple bad predicates

Some Definitions

Are We Done? Not Exactly! Bad( f = NULL )= 1.0

Are We Done? Not Exactly!  Predicate ( x = 0 ) is innocent bystander Program is already doomed Bad( f = NULL )= 1.0 Bad( x = 0 )= 1.0

Crash Probability  Identify unlucky sites on the doomed path  Background risk of failure for reaching this site, regardless of predicate truth/falsehood

Isolate the Predictive Value of P  Does P being true increase the chance of failure over the background rate?  Formal correspondence to likelihood ratio testing

Increase Isolates the Predictor Increase( f = NULL )= 1.0 Increase( x = 0 )= 0.0

It Works! …for programs with just one bug.  Need to deal with multiple bugs How many? Nobody knows!  Redundant predictors remain a major problem Goal: isolate a single “best” predictor for each bug, with no prior knowledge of the number of bugs.

Multiple Bugs: Some Issues  A bug may have many redundant predictors Only need one, provided it is a good one  Bugs occur on vastly different scales Predictors for common bugs may dominate, hiding predictors of less common problems

Bad Idea #1: Rank by Increase(P)  High Increase but very few failing runs  These are all sub-bug predictors Each covers one special case of a larger bug  Redundancy is clearly a problem

Bad Idea #2: Rank by F(P)  Many failing runs but low Increase  Tend to be super-bug predictors Each covers several bugs, plus lots of junk

A Helpful Analogy  In the language of information retrieval Increase(P) has high precision, low recall F(P) has high recall, low precision  Standard solution: Take the harmonic mean of both Rewards high scores in both dimensions

Rank by Harmonic Mean  Definite improvement Large increase, many failures, few or no successes  But redundancy is still a problem

Redundancy Elimination  One predictor for a bug is interesting Additional predictors are a distraction Want to explain each failure once  Similar to minimum set-cover problem Cover all failed runs with subset of predicates Greedy selection using harmonic ranking

Simulated Iterative Bug Fixing 1.Rank all predicates under consideration 2.Select the top-ranked predicate P 3.Add P to bug predictor list 4.Discard P and all runs where P was true Simulates fixing the bug predicted by P Reduces rank of similar predicates 5.Repeat until out of failures or predicates

Not Covered Today  Visualization of Bug Predictors Simple visualization may help reveal trends Increase(P) S(P) error bound log(F(P) + S(P)) Context(P)

Not Covered Today  Reconstruction of failing paths. Bug predictor is often the smoking gun, but not always. Want short, feasible path that exhibits bug. –“Just because it’s undecidable doesn’t mean we don’t need an answer.”