Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee.

Slides:



Advertisements
Similar presentations
Logical Abstract Interpretation Sumit Gulwani Microsoft Research, Redmond.
Advertisements

Gated Graphs and Causal Inference
1 Finite Constraint Domains. 2 u Constraint satisfaction problems (CSP) u A backtracking solver u Node and arc consistency u Bounds consistency u Generalized.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
Some Properties of SSA Mooly Sagiv. Outline Why is it called Static Single Assignment form What does it buy us? How much does it cost us? Open questions.
Type-based Taint Analysis for Java Web Applications Wei Huang, Yao Dong and Ana Milanova Rensselaer Polytechnic Institute 1.
Expressions and Statements. 2 Contents Side effects: expressions and statements Expression notations Expression evaluation orders Conditional statements.
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Finding Security Errors in Java Applications Using Lightweight Static Analysis Benjamin Livshits Computer Science Lab Stanford University.
Complexity 26-1 Complexity Andrei Bulatov Interactive Proofs.
Securing Web Applications Static and Dynamic Information Flow Tracking Jason Ganzhorn 4/12/2010.
A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graph.
Synergy: A New Algorithm for Property Checking
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
ReferencesReferences DiscussionDiscussion Vulnerability Example: SQL injection Auditing Tool for Eclipse LAPSE: a Security Auditing Tool for Eclipse IntroductionIntroductionResultsResults.
Belief Propagation, Junction Trees, and Factor Graphs
1 State Reduction: Row Matching Example 1, Section 14.3 is reworked, setting up enough states to remember the first three bits of every possible input.
Model-Driven Data Acquisition in Sensor Networks - Amol Deshpande et al., VLDB ‘04 Jisu Oh March 20, 2006 CS 580S Paper Presentation.
CS121 Heuristic Search Planning CSPs Adversarial Search Probabilistic Reasoning Probabilistic Belief Learning.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Software Testing and QA Theory and Practice (Chapter 4: Control Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Automatic Creation of SQL Injection and Cross-Site Scripting Attacks 2nd-order XSS attacks 1st-order XSS attacks SQLI attacks Adam Kiezun, Philip J. Guo,
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Efficient Query Evaluation over Temporally Correlated Probabilistic Streams Bhargav Kanagal, Amol Deshpande ΗΥ-562 Advanced Topics on Databases Αλέκα Σεληνιωτάκη.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Computer Security and Penetration Testing
Saving the World Wide Web from Vulnerable JavaScript International Symposium on Software Testing and Analysis (ISSTA 2011) Omer Tripp IBM Software Group.
February 18, 2015CS21 Lecture 181 CS21 Decidability and Tractability Lecture 18 February 18, 2015.
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
Aditya V. Nori, Sriram K. Rajamani Microsoft Research India.
Inferring Specifications to Detect Errors in Code Mana Taghdiri Presented by: Robert Seater MIT Computer Science & AI Lab.
Jose Sanchez 1 o Tielei Wang†, TaoWei†, Zhiqiang Lin‡, Wei Zou†. o Purdue University & Peking University o Proceedings of NDSS'09: Network and Distributed.
CSCI 2670 Introduction to Theory of Computing November 29, 2005.
Lecture 3 Decisions (Conditionals). One of the essential features of computer programs is their ability to make decisions. Like a train that changes tracks.
Slide 1 Vitaly Shmatikov CS 380S Static Detection of Web Application Vulnerabilities.
Merlin: Inferring Specifications for Explicit Information Flow Problems Ben Livshits Aditya Nori Sriram Rajamani Anindya Banerjee.
Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd.
Curtis A. Nelson 1 Technology Mapping of Timed Circuits Curtis A. Nelson University of Utah September 23, 2002.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Complexity 24-1 Complexity Andrei Bulatov Interactive Proofs.
Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications Davide Balzarotti, Marco Cova, Vika Felmetsger, Nenad Jovanovic,
Boolean Programs: A Model and Process For Software Analysis By Thomas Ball and Sriram K. Rajamani Microsoft technical paper MSR-TR Presented by.
Chapter 4 Static Analysis. Summary (1) Building a model of the program:  Lexical analysis  Parsing  Abstract syntax  Semantic Analysis  Tracking.
ELIMINATION on a 3x3 1. Line up equations. 2. Perform “elimination” TWICE on the SAME variable using two DIFFERENT pairs of equations. 3. With the 2 equations.
Copyright © The OWASP Foundation Permission is granted to copy, distribute and/or modify this document under the terms of the OWASP License. The OWASP.
Wolfgang Runte Slide University of Osnabrueck, Software Engineering Research Group Wolfgang Runte Software Engineering Research Group Institute.
CMSC 345 Defensive Programming Practices from Software Engineering 6th Edition by Ian Sommerville.
Weakest Precondition of Unstructured Programs
Control Flow Testing Handouts
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 4 Control Flow Testing
CompSci 230 Software Construction
Outline of the Chapter Basic Idea Outline of Control Flow Testing
Securing A Compiler Transformation
Markov Networks.
Improving Security Using Extensible Lightweight Static Analysis
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
Running example The 4-houses puzzle:
Final Code Generation and Code Optimization
Software Security.
Objectives The student will be able to:
CUTE: A Concolic Unit Testing Engine for C
IntScope: Automatically Detecting Integer overflow vulnerability in X86 Binary Using Symbolic Execution Tielei Wang, TaoWei, ZhingiangLin, weiZou Purdue.
Model Checking and Its Applications
Presentation transcript:

Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee IMDEA Software

Problem: Can we automatically infer which routines in a program are sources, sinks and sanitizers? Technology: Static analysis + Probabilistic inference Applications: – Lowers false errors from tools – Enables more complete flow checking Results: – Over 300 new vulnerabilities discovered in 10 deployed ASP.NET applications

 Web application vulnerabilities are a serious threat!

$username = $_REQUEST['username']; $sql = "SELECT * FROM Students WHERE username = '$username';

Propagation graph m1→ m2 iff information flows “explicitly” from m1 to m2 void ProcessRequest() { string s1 = ReadData1("name"); string s2 = ReadData2("encoding"); string s11 = Prop1(s1); string s22 = Prop2(s2); string s111 = Cleanse(s11); string s222 = Cleanse(s22); WriteData("Parameter " + s111); WriteData("Header " + s222); }

 Source  returns tainted data  Sink  error to pass tainted data  Sanitizer  cleanse or untaint the input  Regular nodes  propagate input to output  Every path from a source to a sink should go through a sanitizer  Any source to sink path without a sanitizer is an information flow vulnerability

void ProcessRequest() { string s1 = ReadData1("name"); string s2 = ReadData2("encoding"); string s11 = Prop1(s1); string s22 = Prop2(s2); string s111 = Cleanse(s11); string s222 = Cleanse(s22); WriteData("Parameter " + s111); WriteData("Header " + s222); } Vulnerabilities?

void ProcessRequest() { string s1 = ReadData1("name"); string s2 = ReadData2("encoding"); string s11 = Prop1(s1); string s22 = Prop2(s2); string s111 = Cleanse(s11); string s222 = Cleanse(s22); WriteData("Parameter " + s111); WriteData("Header " + s222); } source sink source sanitizer

void ProcessRequest() { string s1 = ReadData1("name"); string s2 = ReadData2("encoding"); string s11 = Prop1(s1); string s22 = Prop2(s2); string s111 = Cleanse(s11); string s222 = Cleanse(s22); WriteData("Parameter " + s111); WriteData("Header " + s222); } source sanitizer sink source

Assumption Most flow paths in the propagation graph are secure Given a propagation graph, can we infer a specification or ‘complete’ a partial specification?

Initial specification Program Final specification Prop. graph construction Factor graph construction Probabilistic inference Merlin Vulnerabilities

void ProcessRequest() { string s1 = ReadData1("name"); string s2 = ReadData2("encoding"); string s11 = Prop1(s1); string s22 = Prop2(s2); string s111 = Cleanse(s11); string s222 = Cleanse(s22); WriteData("Parameter " + s111); WriteData("Header " + s222); }

source sanitizer sink ? source ? sink source sanitizer ?

 For every acyclic path m 1 m 2 … m n the probability that m 1 is a source, m n is a sink, and m 2, …, m n-1 are not sanitizers is very low Exponential number of path constraints: O(2 |V| ) ! ReadData1 Prop1 Cleanse WriteData source sink

For every triple such that m i is on a path from m 1 to m n, the probability that m 1 is a source, m n is a sink, and m i is not a sanitizer is very low Cubic number of triple constraints: O(|V| 3 ) ! ReadData1 Prop1 WriteData ReadData1 WriteData Cleanse source sink source sink

source sink

For every pair of nodes m 1, m 2 such that m 1 and m 2 lie on the same path from a potential source to a potential sink, the probability that both m 1 and m 2 are sanitizers is low ReadData1 Prop1 Cleanse WriteData source sink

a b c d Triple constraints  ¬(a ∧ ¬b ∧ d)  ¬(a ∧ ¬c ∧ d) Avoid double sanitizers  ¬(b ∧ c)  a ∧ d ⇒ b  a ∧ d ⇒ c  ¬(b ∧ c)

(x 1 ∨ x 2 ) ∧ (x 1 ∨ ¬x 3 ) C1C1 C2C2 f(x 1, x 2, x 3 ) = f C1 (x 1, x 2 ) ∧ f C2 (x 1, x 3 ) f C1 (x 1, x 2 ) = f C2 (x 1, x 3 ) = 1 if x 1 ∨ x 2 = true 0 otherwise 1 if x 1 ∨ ¬x 3 = true 0 otherwise

f(x 1, x 2, x 3 ) = f C1 (x 1, x 2 ) ∧ f C2 (x 1, x 3 ) f C1 (x 1, x 2 ) = f C2 (x 1, x 3 ) = 1 if x 1 ∨ x 2 = true 0 otherwise 1 if x 1 ∨ ¬x 3 = true 0 otherwise (x 1 ∨ x 2 ) ∧ (x 1 ∨ ¬x 3 ) C1C1 C2C2 p(x 1, x 2, x 3 ) = f C1 (x 1, x 2 ) × f C2 (x 1, x 3 )/Z Z = ∑ x1, x2, x3 (f C1 (x 1, x 2 ) × f C2 (x 1, x 3 ))

Step 1: choose x i with highest p i (x i ) and set x i = true if p i (x i ) is greater than a threshold, false otherwise Step 2: recompute marginals and repeat Step 1 until all variables have been assigned marginal p i (x i ) = ∑ x1 … ∑ x(i-1) ∑ x(i+1) … ∑ xN p(x 1, …, x N )

f C1 (x 1, x 2 ) = f C2 (x 1, x 3 ) = 1 if x 1 ∨ x 2 = true 0 otherwise 1 if x 1 ∨ ¬x 3 = true 0 otherwise

SourceSanitizer Sink ReadData ReadData2.5 Cleanse.5 WriteData.5.85 … SourceSanitizerSink ReadData ReadData2.5 Cleanse WriteData.5.85 … Probabilistic Inference

Theorem Path refines Triple

 Merlin is implemented in C#  Uses CAT.NET for building the propagation graph  Uses Infer.NET for probabilistic inference ▪

10 line-of-business applications written in C# using ASP.NET

Original With Merlin False positives eliminated

 10 large Web apps in.NET  Time taken per app < 4 minutes  New specs: 167  New vulnerabilities: 322  False positives removed: 13  Final false positive rate for CAT.NET after Merlin < 1%

 Merlin is first practical approach to infer explicit information flow specifications  Design based on a formal characterization of an approximate probabilistic constraint system  Able to successfully and efficiently infer explicit information flow specifications in large applications which result in detection of new vulnerabilities