This One Time, at PL Camp... Summer School on Language-Based Techniques for Integrating with the External World University of Oregon Eugene, Oregon July 2007
Checking Type Safety of Foreign Function Calls Jeff Foster University of Maryland Ensure type safety across languages OCaml/JNI – C Multi-lingual type inference system Representational types SAFFIRE Multi-lingual type inference system
Dangers of FFIs In most FFIs, programmers write “glue code” Translates data between host and foreign languages Typically written in one of the languages Unfortunately, FFIs are often easy to misuse Little or no checking done at language boundary Mistakes can silently corrupt memory One solution: interface generators
Example: “Pattern Matching” if (Is_long(x)) { if (Int_val(x) == 0) /* B */... if (Int_val(x) == 1) /* D */... } else { if (Tag_val(x) == 0) /* A */ Field(x, 0) = Val_int(0) if (Tag_val(x) == 1) /* C */ Field(x, 1) = Val_int(0) } type t = A of int | B | C of int * int | D
Garbage Collection C FFI functions need to play nice with the GC Pointers from C to the OCaml heap must be registered value bar(value list) { CAMLparam1(list); CAMLlocal1(temp); temp = alloc_tuple(2); CAMLreturn(Val_unit); } Easy to forget Difficult to find this error with testing
Multi-Lingual Types Representational Types Embed OCaml types in C types and vice versa
SAFFIRE Static Analysis of Foreign Function InteRfacEs
Programming Models for Distributed Computing Yannis Smaragdakis University of Oregon NRMI: Natural programming model for distributed computing. J-Orchestra: Execute unsuspecting programs over a network, using program rewriting. Morphing: High-level language facility for safe program transformation.
NRMI Identify all reachable t alias1 alias tree Client sideServer side Network
NRMI Execute remote procedure t alias1 alias tree Client sideServer side Network 2 tmp
NRMI Send back all reachable t alias1 alias tree Client side Network 2
NRMI Match reachable maps t alias1 alias tree Network 2
NRMI Update original objects t alias1 alias tree Network 2
NRMI Adjust links out of original objects t alias1 alias tree Network 2
NRMI Adjust links out of new objects t alias1 alias tree Network 2
NRMI Garbage collect t alias1 alias2 Network 2
J-Orchestra Automatic partition system Works as bytecode compiler lots of indirection using proxies, interfaces, local and remote objects Partitioned program equivalent to original
Morphing Ensure program generators are safe Statically check the generator to determine the safety of any generated program, under All inputs ensure that genrated programs compile Early approach – SafeGen Using theorem provers MJ Using types
Fault Tolerant Computing David August and David Walker Princeton University Processors are becoming more susceptible to intermittent faults. Moore’s Law, radiation Alter computation or state, resulting in incorrect program execution. Goal: Build reliable systems from unreliable components.
Topics Transient faults and mechanisms designed to protect against them (HW). The role of languages and compilers may play in creating radiation hardened programs. New opportunities made possible by languages which embrace potentially incorrect behavior.
Causes
Software/Compiler Duplicate instructions and check at important locations (store) [SWIFT, EDDI]
λ zap λ calculus with fault tolerance Intermediate language for compilers Models single fault Based on replication Semantics model type of faults let x = 2 in let y = x + x in out y let x 1 = 2 in let x 2 = 2 in let x 3 = 2 in let y 1 = x 1 + x 1 in let y 2 = x 2 + x 2 in let y 3 = x 3 + x 3 in out [y 1,y 2,y 3 ]
Testing
Typing Ad Hoc Data Kathleen Fisher AT&T Labs PADS project * Data Description Language (DDL) Data Description Calculus (DDC) Automatic inference of PADS descriptions *
PADS Declarative description of data source: Physical format information Semantic constraints type responseCode = { x : Int | 99 < x < 600} Pstruct webRecord { Pip ip; " - - ["; Pdate(’:’) date; ":"; Ptime(’]’) time; "]"; httpMeth meth; " "; Puint8 code; " "; Puint8 size; " "; }; Parray webLog { webRecord[] };
Raw Data ASCII log files Binary Traces struct { } Data Description XML CSV Standard formats & schema; Visual Information End-user tools Learning Problem: Producing useful tools for ad hoc data takes a lot of time. Solution: A learning system to generate data descriptions and tools automatically.
Format Inference Engine Chunked Data Format Refinement Tokenization Structure Discovery Scoring Function IR to PADS Printer PADS Description Input File(s)
Multi-Staged Programming Walid Taha Rice University Writing generic program that do not pay a runtime overhead. Use program generators Ensure syntactic well-formed, well-typed MetaOCaml
The Abstract View P2P2 P1P1 I1I1 P Batch I2I2 I2I2 I2I2
MetaOCaml Brackets (..) delay execution of an expression Escape (.~ ) Combine smaller delayed values to construct larger ones Run (.! ) Compile and execute the dynamically generated code
Power Example let rec power (n, x) = match n with 0 → 1 | n → x * (power (n-1, x));; let power2 (x) = power (2, x);; let power2 = fun x → power (2, x);; let power2 (x) = 1*x*x; let rec power (n, x) = match n with 0 →.. | n →..;; let power2 =.!..))>.;;
Scalable Defect Detection Manuvir Das, Daniel Wang, Zhe Yang, Microsoft Research Program analysis at Microsoft scale scalability, accuracy Combination of weak global analysis and slow local one (for some regions of code) Programmers are requires to add interface annotations some automatic inference is available
Web and Database Application Security Zhendong Su University of California-Davis Static analyses for enforcing correctness of dynamically generated database queries. Runtime checking mechanisms for detecting SQL injection attacks; Static analyses for detecting SQL injection and cross-site scripting vulnerabilities.
XML and Web Application Programming Anders Møller University of Aarhus Formal models of XML schemas Expressiveness of DTD, XML Schema, Relax NG Type checking XML transformation languages “Assuming that X is valid according to S in is T(x) valid according to S out ?” Web application frameworks Java Servlets and JSP, JWIG, GWT
Types for Safe C-Level Programming Dan Grossman University of Washington Cyclone, a safe dialect for C Designed to prevent safety violations (buffer overflow, memory management, …) Mostly underlying theory Types, expression, memory regions
Analyzing and Debugging Software Understanding Multilingual Software [Foster] Parlez vous OCaml? Statistical Debugging [Liblit] you are my beta tester, and there’s lots of you Scalable Defect Detection [Das, Wang, Yang] Microsoft programs have no bugs
Programming Models Types for Safe C-Level Programming [Grossman] C without the ick factor Staged Programming [Taha] Programs that produce programs that produce programs... Prog. Modles for Dist. Comp. [Smaragdakis] We’ve secretly replaced your centralized program with a distributed application. can you tell the difference?
The Web Web and Database Application Security [Su] How not to be pwn3d by 1337 haxxors XML and Web Application Programming [Møller] X is worth 8 points in scrabble...let’s use it a lot
Other Really Important Stuff Fault Tolerant Computing [August, Walker] Help, I’ve been hit by a cosmic ray! Typing Ad Hoc Data [Fisher] Data, data, everywhere, but what does it mean?
Statistical Debugging Ben Liblit University Of Wisconsin-Madison
Statistical Debugging & Cooperative Bug Isolation Observe deployed software in the hands of real end users Build statistical models of success & failure Guide programmers to the root causes of bugs Make software suck less What’s This All About?
Motivation “There are no significant bugs in our released software that any significant number of users want fixed.” Bill Gates, quoted in FOCUS Magazine
Software Releases in the Real World [Disclaimer: this may be a caricature.]
Software Releases in the Real World 1.Coders & testers in tight feedback loop Detailed monitoring, high repeatability Testing approximates reality 2.Testers & management declare “Ship it!” Perfection is not an option Developers don’t decide when to ship
Software Releases in the Real World 3.Everyone goes on vacation Congratulate yourselves on a job well done! What could possibly go wrong? 4.Upon return, hide from tech support Much can go wrong, and you know it Users define reality, and it’s not pretty –Where “not pretty” means “badly approximated by testing”
Testing as Approximation of Reality Microsoft’s Watson error reporting system Crash report from 500,000 separate programs x % of software causes 50% of bugs Care to guess what x is? 1% of software errors causes 50% of user crashes Small mismatch ➙ big problems (sometime) Big mismatch ➙ small problem? (sometime!) Perfection is not an economically viable option
Real Engineers Measure Things; Are Software Engineers Real Engineers?
“The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong, it usually turns out to be impossible to get at or repair.” Instrumentation Framework Douglas Adams, Mostly Harmless
Bug Isolation Architecture Program Source Compiler Shipping Application Sampler Predicates Counts & / Statistical Debugging Top bugs with likely causes
Each behavior is expressed as a predicate P on program state at a particular program point. Count how often “P observed true” and “P observed” using sparse but fair random samples of complete behavior. Model of Behavior
Program Source Compiler Shipping Application Sampler Predicates Counts & / Statistical Debugging Top bugs with likely causes Predicate Injection: Guessing What’s Interesting
Branch Predicates Are Interesting if (p) … else …
if (p) // p was true (nonzero) else // p was false (zero) Syntax yields instrumentation site Site yields predicates on program behavior Exactly one predicate true per visit to site Branch Predicate Counts
Returned Values Are Interesting n = fprintf(…); Did you know that fprintf() returns a value? Do you know what the return value means? Do you remember to check it?
n = fprintf(…); // return value 0 ? Syntax yields instrumentation site Site yields predicates on program behavior Exactly one predicate true per visit to site Returned Value Predicate Counts
Pair Relationships Are Interesting int i, j, k; … i = …;
Pair Relationship Predicate Counts int i, j, k; … i = …; // compare new value of i with… //other vars: j, k, … //old value of i //“important” constants
Many Other Behaviors of Interest Assert statements Perhaps automatically introduced, e.g. by CCured Unusual floating point values Did you know there are nine kinds? Coverage of modules, functions, basic blocks, … Reference counts: negative, zero, positive, invalid Kinds of pointer: stack, heap, null, … Temporal relationships: x before/after y More ideas? Toss them all into the mix!
Observation stream observation count How often is each predicate observed true? Removes time dimension, for good or ill Bump exactly one counter per observation Infer additional predicates (e.g. ≤, ≠, ≥) offline Feedback report is: 1.Vector of predicate counters 2.Success/failure outcome label Still quite a lot to measure What about performance? Summarization and Reporting
Program Source Compiler Shipping Application Sampler Predicates Counts & / Statistical Debugging Top bugs with likely causes Fair Sampling Transformation
Sampling the Bernoulli Way Decide to examine or ignore each site… Randomly Independently Dynamically Cannot be periodic: unfair temporal aliasing Cannot toss coin at each site: too slow
Amortized Coin Tossing Randomized global countdown Small countdown upcoming sample Selected from geometric distribution Inter-arrival time for biased coin toss How many tails before next head? Mean sampling rate is tunable parameter
Geometric Distribution D= mean of distribution = expected sample density
Weighing Acyclic Regions Break CFG into acyclic regions Each region has: Finite number of paths Finite max number of instrumentation sites Compute max weight in bottom-up pass
Weighing Acyclic Regions Clone acyclic regions “Fast” variant “Slow” variant Choose at run time Retain decrements on fast path for now Stay tuned… >4?
Path Balancing Optimization Decrements on fast path are a bummer Goal: batch them up But some paths are shorter than others Idea: add extra “ghost” instrumentation sites Pad out shorter paths All paths now equal
Path Balancing Optimization Fast path is faster One bulk counter decrement on entry Instrumentation sites have no code at all Slow path is slower More decrements Consume more randomness
Optimizations Identify and ignore “weightless” functions / cycles Cache global countdown in local variable Avoid cloning Static branch prediction at region heads Partition sites among several binaries Many additional possibilities…
What Does This Give Us? Absolutely certain of what we do see Subset of dynamic behavior Success/failure label for entire run Uncertain of what we don’t see Given enough runs, samples ≈ reality Common events seen most often Rare events seen at proportionate rate
Playing the Numbers Game Program Source Compiler Shipping Application Sampler Predicates Counts & / Statistical Debugging Top bugs with likely causes
Isolating a Deterministic Bug Hunt for crashing bug in ccrypt-1.2 Sample function return values Triple of counters per call site: 0 Use process of elimination Look for predicates true on some bad runs, but never true on any good run
Elimination Strategies Universal Falsehood Disregard P if |P| = 0 for all runs Likely a predicate that can never be true Lack of failing coverage All predicates for S is |S|=0 for all failed runs Site not reached in failing executions Lack of failing example |P|=0 for all failed executions Need not be true for a failure to occur Successful counterexample |P|>0 on at least one successful run Can be true without causing failure
Winnowing Down the Culprits 1710 counters 3 × 570 call sites 1569 zero on all runs 141 remain 139 nonzero on at least one successful run Not much left! file_exists() > 0 xreadline() == 0
Multiple, Non-Deterministic Bugs Strict process of elimination won’t work Can’t assume program will crash when it should No single common characteristic of all failures Look for general correlation, not perfect prediction Warning! Statistics ahead!
Ranked Predicate Selection Consider each predicate P one at a time Include inferred predicates (e.g. ≤, ≠, ≥) How likely is failure when P is true? (technically, when P is observed to be true) Multiple bugs yield multiple bad predicates
Some Definitions
Are We Done? Not Exactly! Bad( f = NULL )= 1.0
Are We Done? Not Exactly! Predicate ( x = 0 ) is innocent bystander Program is already doomed Bad( f = NULL )= 1.0 Bad( x = 0 )= 1.0
Crash Probability Identify unlucky sites on the doomed path Background risk of failure for reaching this site, regardless of predicate truth/falsehood
Isolate the Predictive Value of P Does P being true increase the chance of failure over the background rate? Formal correspondence to likelihood ratio testing
Increase Isolates the Predictor Increase( f = NULL )= 1.0 Increase( x = 0 )= 0.0
It Works! …for programs with just one bug. Need to deal with multiple bugs How many? Nobody knows! Redundant predictors remain a major problem Goal: isolate a single “best” predictor for each bug, with no prior knowledge of the number of bugs.
Multiple Bugs: Some Issues A bug may have many redundant predictors Only need one, provided it is a good one Bugs occur on vastly different scales Predictors for common bugs may dominate, hiding predictors of less common problems
Bad Idea #1: Rank by Increase(P) High Increase but very few failing runs These are all sub-bug predictors Each covers one special case of a larger bug Redundancy is clearly a problem
Bad Idea #2: Rank by F(P) Many failing runs but low Increase Tend to be super-bug predictors Each covers several bugs, plus lots of junk
A Helpful Analogy In the language of information retrieval Increase(P) has high precision, low recall F(P) has high recall, low precision Standard solution: Take the harmonic mean of both Rewards high scores in both dimensions
Rank by Harmonic Mean Definite improvement Large increase, many failures, few or no successes But redundancy is still a problem
Redundancy Elimination One predictor for a bug is interesting Additional predictors are a distraction Want to explain each failure once Similar to minimum set-cover problem Cover all failed runs with subset of predicates Greedy selection using harmonic ranking
Simulated Iterative Bug Fixing 1.Rank all predicates under consideration 2.Select the top-ranked predicate P 3.Add P to bug predictor list 4.Discard P and all runs where P was true Simulates fixing the bug predicted by P Reduces rank of similar predicates 5.Repeat until out of failures or predicates
Not Covered Today Visualization of Bug Predictors Simple visualization may help reveal trends Increase(P) S(P) error bound log(F(P) + S(P)) Context(P)
Not Covered Today Reconstruction of failing paths. Bug predictor is often the smoking gun, but not always. Want short, feasible path that exhibits bug. –“Just because it’s undecidable doesn’t mean we don’t need an answer.”