Automatic Repair for Input Validation and Sanitization Bugs 1.

Slides:



Advertisements
Similar presentations
PHP Form and File Handling
Advertisements

Hossain Shahriar Mohammad Zulkernine. One of the worst vulnerabilities in web applications It involves the generation of dynamic HTML contents with invalidated.
Deterministic Finite Automata (DFA)
Finite Automata CPSC 388 Ellen Walker Hiram College.
Differential String Analysis Tevfik Bultan (Joint work with Muath Alkhalaf, Fang Yu and Abdulbaki Aydin) 1 Verification Lab Department.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY (For next time: Read Chapter 1.3 of the book)
CS5371 Theory of Computation
1 Approximate string matching using factor automata Jan Holub and Borivoj Melichar Theoretical Computer Science vol.249 p Speaker: L. C. Chen Advisor:
61 Nondeterminism and Nodeterministic Automata. 62 The computational machine models that we learned in the class are deterministic in the sense that the.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
A String Constraint Solver for Detecting Web Application Vulnerability Xiang Fu Hofstra University Chung-Chih Li Illinois State University 07/03/2010SEKES.
Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,
Javascript II Expressions and Data Types. 2 JavaScript Review programs executed by the web browser programs embedded in a web page using the script element.
1 Pushdown Stack Automata Zeph Grunschlag. 2 Agenda Pushdown Automata Stacks and recursiveness Formal Definition.
Finite Automata Chapter 5. Formal Language Definitions Why need formal definitions of language –Define a precise, unambiguous and uniform interpretation.
Topics Automata Theory Grammars and Languages Complexities
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
Regular Expressions & Automata Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,
Muath Alkhalaf 1 Shauvik Roy Choudhary 2 Mattia Fazzini 2 Tevfik Bultan 1 Alessandro Orso 2 Christopher Kruegel 1 1 UC Santa Barbara 2 Georgia Tech.
Great Theoretical Ideas in Computer Science.
Automatic Creation of SQL Injection and Cross-Site Scripting Attacks 2nd-order XSS attacks 1st-order XSS attacks SQLI attacks Adam Kiezun, Philip J. Guo,
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
CS 290C: Formal Models for Web Software Lectures 17: Analyzing Input Validation and Sanitization in Web Applications Instructor: Tevfik Bultan.
Web Application Access to Databases. Logistics Test 2: May 1 st (24 hours) Extra office hours: Friday 2:30 – 4:00 pm Tuesday May 5 th – you can review.
Eliminating Bugs In MVC-Style Web Applications Tevfik Bultan Verification Lab (Vlab),
Minimization of Symbolic Automata Presented By: Loris D’Antoni Joint work with: Margus Veanes 01/24/14, POPL14.
Regular Expressions and Finite State Automata  Themes  Finite State Automata (FSA)  Describing patterns with graphs  Programs that keep track of state.
Regular Expressions and Finite State Automata Themes –Finite State Automata (FSA) Describing patterns with graphs Programs that keep track of state –Regular.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
FSE 2014 Tutorial String Analysis Tevfik Bultan University of California, Santa Barbara, USA Fang Yu National Chengchi University, Taiwan.
Lecture 2: Lexical Analysis
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Internet Information Systems Writing to Databases and Amending Data.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2.
Lecture # 4 Chapter 1 (Left over Topics) Chapter 3 (continue)
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Detecting and Repairing Security Vulnerabilities in Web Applications Tevfik Bultan (Joint work with Muath Alkhalaf, Fang Yu and Abdulbaki Aydin) 1
Differential String Analysis Tevfik Bultan (Joint work with Muath Alkhalaf, Fang Yu and Abdulbaki Aydin) 1 Verification Lab Department.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
CS 267: Automated Verification Lectures 16: String Analysis Instructor: Tevfik Bultan.
Great Theoretical Ideas in Computer Science for Some.
Given a string manipulating program, string analysis determines all possible values that a string expression can take during any program execution Using.
Relational String Verification Using Multi-track Automata.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications Davide Balzarotti, Marco Cova, Vika Felmetsger, Nenad Jovanovic,
Expressions and Data Types Professor Robin Burke.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Session 11: Cookies, Sessions ans Security iNET Academy Open Source Web Development.
Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.
Akram Salah ISSR Basic Concepts Languages Grammar Automata (Automaton)
Copyright © The OWASP Foundation Permission is granted to copy, distribute and/or modify this document under the terms of the OWASP License. The OWASP.
SQL Injection.
String Analysis for Dependable Input Validation
Reasoning about code CSE 331 University of Washington.
Automata Based String Analysis for Vulnerability Detection
Two issues in lexical analysis
Jaya Krishna, M.Tech, Assistant Professor
Introduction to Data Structures
String Analysis Tevfik Bultan Verification Lab
NFAs and Transition Graphs
Basics.
CSC312 Automata Theory Transition Graphs Lecture # 9
Data Structures and Algorithms for Information Processing
NFAs and Transition Graphs
Presentation transcript:

Automatic Repair for Input Validation and Sanitization Bugs 1

Classification of Input Validation and Sanitization Functions PureValidatorPureValidator Input Yes (valid) No (invalid) PureSanitizerPureSanitizer Input Output ValidatingSanitizerValidatingSanitizer Input Output No (invalid) 2

Overview Sanitizer Function Symbolic Forward Fix-Point Computation Symbolic Backward Fix-Point Computation String Analysis Post-Image (Post-Condition) Pre-Image (Pre-Condition) Negative Pre-Image (Pre-Condition for reject) 3

sanitizer(x){ if (x != “aa” && x != “bb” && x != “ab”) reject; x = replace(/^ab$/, “ba”,x); return x; } Sanitizers aa bb ab aab bbb aa bb T ba Σ* Σ* ∪ T rejecting invalid inputs Σ = {a, b}

aa bb ab aab bbb aa bb T ba Σ* ∪ (Non) Preferred Output Pre-image Negative Pre-Image Reject Possible output (Post Image) Σ* Post-Image, Pre-Image and Negative Pre-Image b a, T sanitizer(x){ if (x != “aa” && x != “bb” && x != “ab”) reject; x = replace(/^ab$/, “ba”,x); return x; } 5

Attack Patterns For example: for detecting XSS and SQLI An attack pattern is negation of a Max policy –Attack patterns specify bad strings Example: –/. ∗ <script. ∗ / (for XSS) –Any string that contains the substring <script is bad 6

XSS Vulnerability Example 1:<?php 2: $www = $_GET[”www”]; 3: $l_otherinfo = ”URL”; 4: $www = ””,$www); 5: echo ” ”. $l_otherinfo. ”: ”. $www. ” ”; 6:?> 7  means all characters from.  This includes  XSS Vulnerability  means all characters from.  This includes  XSS Vulnerability

Vulnerability Detection Σ* Σ* ∪ ∩ Post-Image (output) URL: foo Attack Pattern U R L : foo foo<bar Attack Pattern foo<bar T T Bad Output URL: < = 8 /.*<.*/

Vulnerability Signature Generation and Vulnerability Repair Σ* T Bad Output URL: < Vulnerability Signature All inputs that can exploit the vulnerability (or an over approximation of that set) Pre-Image (Bad Input) X min-cut is {<} reject Σ* ∪ T 9

Generated Patch 1: <?php P: if(preg match(’/[^ <]*<.*/’,$_GET[”www”])) $_GET[”www”] = preg replace(’<’,””,$_GET[”www”]); 2: $www = $_GET[”www”]; 3: $l_otherinfo = ”URL”; 4: $www = 5: echo ” ”. $l_otherinfo. ”: ”.$www. ” ”; 6: ?> min-cut is {<} 10 InputOriginal OutputNew Output FoobarURL: Foobar Foo<barURL: Foo<barURL: Foobar a<b<c<da<b<c<dURL: a<b<c<dURL: abcd

Patches from Vulnerability Signatures Ideally, we want to modify the input (as little as possible) so that it does not match the vulnerability signature Given a DFA, an alphabet cut is –a set of characters that after ”removing” the edges that are associated with the characters in the set, the modified DFA does not accept any non-empty string Finding a minimal alphabet cut of a DFA is an NP-hard problem (one can reduce the vertex cover problem to this problem) –We use a min-cut algorithm instead –The set of characters that are associated with the edges of the min cut is an alphabet cut but not necessarily the minimum alphabet cut 11

Experiments We evaluated our approach on five vulnerabilities from three open source web applications: (1)MyEasyMarket-4.1: A shopping cart program (2) BloggIT-1.0: A blog engine (3) proManager-0.72: A project management system We used the following XSS attack pattern: Σ ∗ <script Σ ∗ 12

Forward Analysis Results The dependency graphs of these benchmarks are simplified based on the sinks –Unrelated parts are removed using slicing InputResults #nodes#edges#sinks#inputsTime(s)Mem (kb)#states/# bdds / / / / /

Backward Analysis Results We use the backward analysis to generate the vulnerability signatures –Backward analysis starts from the vulnerable sinks identified during forward analysis InputResults #nodes#edges#sinks#inputsTime(s)Mem (kb)#states/# bdds / / /302, 20/ / /302 14

Alphabet Cuts We generate cuts from the vulnerability signatures using a min- cut algorithm Problem: When there are two user inputs the patch will block everything and delete everything –Overlooks the relations among input variables (e.g., the concatenation of two inputs contains < SCRIPT) InputResults #nodes#edges#sinks#inputsAlphabet Cut {<} 29 11{S,’,”} Σ, Σ {<,’,”} 25 11{<,’,”} Vulnerability signature depends on two inputs 15

Relational Vulnerability Signature Perform forward analysis using multi-track automata to generate relational vulnerability signatures Each track represents one user input –An auxiliary track represents the values of the current node –We intersect the auxiliary track with the attack pattern upon termination 16

Relational Vulnerability Signature Consider a simple example having multiple user inputs <?php 1: $www = $_GET[”www”]; 2: $url = $_GET[”url”]; 3: echo $url. $www; ?> Let the attack pattern be Σ ∗ < Σ ∗ 17

Relational Vulnerability Signature A multi-track automaton: ($url, $www, aux) Identifies the fact that the concatenation of two inputs contains < (a, λ, a), (b, λ, b), … (<, λ,<) ( λ, a,a), ( λ, b,b), … ( λ, a,a), ( λ, b,b), … ( λ,<,<) ( λ, a,a), ( λ, b,b), … ( λ, a,a), ( λ, b,b), … (a, λ, a), (b, λ, b), … 18

Relational Vulnerability Signature Project away the auxiliary variable Find the min-cut This min-cut identifies the alphabet cuts {<} for the first track ($url) and {<} for the second track ($www) (a, λ ), (b, λ ), … (<, λ ) ( λ, a), ( λ, b), … ( λ, a), ( λ, b), … ( λ,<) ( λ, a), ( λ, b), … ( λ, a), ( λ, b), … (a, λ ), (b, λ ), … min-cut is {<},{<} 19

Patch for Multiple Inputs Patch: If the inputs match the signature, delete its alphabet cut <?php if (preg match(’/[^ <]*<.*/’, $ GET[”url”].$ GET[”www”])) { $ GET[”url”] = preg replace(<,””,$ GET[”url”]); $ GET[”www”] = preg replace(<,””,$ GET[”www”]); } 1: $www = $ GET[”www”]; 2: $url = $ GET[”url”]; 3: echo $url. $www; ?> 20

Differential Analysis and Repair ?=?= Yes No Target Sanitizer Generate Patch String Analysis Reference Sanitizer 21

Why Differential? Submit DB unsupscribe.php php 22

A Javascript/Java Input Validation Function 23 function validate (form) { var Str = form[" "].value; if( Str.length == 0) { return true; } var r1 = new RegExp("( var r2 = new [\\w]{2,4})$"); if(!r1.test( Str) && r2.test( Str)) { return true; } return false; } public boolean validate (Object bean, Field f,..) { String val = ValidatorUtils.getValueAsString(bean, f); Perl5Util u = new Perl5Util(); if (!(val == null || val.trim().length == 0)) { if ((!u.match("/( val)) && val)){ return true; } else { return false; } return true; } public boolean validate (Object bean, Field f,..) { String val = ValidatorUtils.getValueAsString(bean, f); Perl5Util u = new Perl5Util(); if (!(val == null || val.trim().length == 0)) { if ((!u.match("/( val)) && val)){ return true; } else { return false; } return true; } “

1 st Step: Find Inconsistency Σ* T Σ* ∪ T Σ* T Σ* ∪ T Target Reference 24 Output difference: Strings returned by target but not by reference ?=?=

Differential Analysis Evaluation Analyzed a number of Java EE web applications –Only looking for differences (inconsistencies) 25 NameURL JGOSSIP VEHICLE MEODIST MYALUMNI CONSUMER TUDU JCRBIB

Analysis Phase Time Performance & Inconsistencies That We Found 26

Analysis Phase Memory Usage 27

2 nd Step: Differential Repair Σ* T Σ* ∪ T Σ* T Σ* ∪ T Target Reference Σ* T Σ* ∪ T Repaired Function 28 ≠

Composing Sanitizers? Can we run the two sanitizers one after the other? Does not work due to lack of Idempotency – Both sanitizers escape ’ with \ – Input ab’c – 1 st sanitizer  ab\’c – 2 nd sanitizer  ab\\’c Security problem (double escaping) We need to find the difference 29

How to repair? Σ* T Σ* ∪ T Σ* T Σ* ∪ T Target Reference X 30

function reference($x){ $x = preg_replace(“<“, “”, $x); if (strlen($x) < 4) return $x; else die(“error”); } function target($x){ $x = preg_replace(“’”, “\’”, $x);return $x; } Σ* T Σ* ∪ T Σ* T Σ* ∪ T X Output difference: Strings returned by target but not by reference 31

function reference($x){ $x = preg_replace(“<“, “”, $x); if (strlen($x) < 4) return $x; else die(“error”); } function target($x){ $x = preg_replace(“’”, “\’”, $x);return $x; } InputTargetReferenceDiff Type “<““<““<““<““”Sanitization “ ’’ ”“ \’\’ ”“ ’’ ”Sanitization + Length “ abcd ” Validation 32 Set of input strings that resulted in the difference ‘ ‘ ‘ T

Mincut results in deleting everything “foo”  “” Why? You can not remove a validation difference using a sanitization patch function reference($x){ $x = str_replace(“<“, “”, $x); if (strlen($x) < 4) return $x; else die(“error”); } function target($x){ $x = str_replace(“’”, “\’”, $x);return $x; } 33

(1) Validation Patch function reference($x){ $x = str_replace(“<“, “”, $x); if (strlen($x) < 4) return $x; else die(“error”); } function target($x){ $x = str_replace(“’”, “\’”, $x);return $x; } Σ* T Σ* ∪ T Σ* T Σ* ∪ T function valid_patch($x){ if (stranger_match1($x)) die(“error”); } Validation patch DFA 34

function reference($x){ $x = str_replace(“<“, “”, $x); if (strlen($x) < 4) return $x; else die(“error”); } function valid_patch($x){ if (stranger_match1($x)) die(“error”); } Σ* T Σ* ∪ T Σ* T Σ* ∪ T X MinCut = { ‘, <} “fo ’ ”  “fo\ ’ ” function target($x){ $x = str_replace(“’”, “\’”, $x);return $x; } 35

function reference($x){ $x = str_replace(“<“, “”, $x); if (strlen($x) < 4) return $x; else die(“error”); } Σ* T Σ* ∪ T Σ* T Σ* ∪ T function target($x){ $x = str_replace(“’”, “\’”, $x);return $x; } function valid_patch($x){ if (stranger_match1($x)) die(“error”); } function length_patch($x){ if (stranger_match2($x)) die(“error”); } function valid_patch($x){ if (stranger_match1($x)) die(“error”); } Length of Reference DFA Unwanted length in target caused by escape (2) Length Patch 36 Post-image R = {a, foo, baar} Len = Σ 1 ∪ Σ 3 ∪ Σ 4 Post-image T = {bb, car} Diff = {bb}

function reference($x){ $x = str_replace(“<“, “”, $x); if (strlen($x) < 4) return $x; else die(“error”); } Σ* T Σ* ∪ T Σ* T Σ* ∪ T function target($x){ $x = str_replace(“’”, “\’”, $x);return $x; } function valid_patch($x){ if (stranger_match1($x)) die(“error”); } function length_patch($x){ if (stranger_match2($x)) die(“error”); } X Length of Reference DFA Unwanted length in target caused by escape Target Restricted Length Target Restricted Length Reference Post-image (3) Sanitization Patch 37 Sanitization difference

function reference($x){ $x = str_replace(“<“, “”, $x); if (strlen($x) < 4) return $x; else die(“error”); } function target($x){ $x = preg_replace(‘”’, ‘\”’, $x);return $x; } function valid_patch($x){ if (stranger_match1($x)) die(“error”); } function length_patch($x){ if (stranger_match2($x)) die(“error”); } MinCut = {<} function target($x){ $x = str_replace(“’”, “\’”, $x);return $x; } function sanit_patch($x){ $x = str_replace(“<“, “”, $x);return $x; } (3) Sanitization Patch 38

MinCut Heuristics We use two heuristics for mincut Trim: – Only if mincut contain space character – Test if reference Post-Image is does not have space at the beginning and end – Assume it is trim () Escape: – Test if reference Post-Image escapes the mincut characters 39

Differential Repair Evaluation We ran the differential patching algorithm on 5 PHP web applications NameDescription PHPNews v1.3.0 News publishing software UseBB v forum software Snipe Gallery v3.1.5 Image management system MyBloggie v2.1.6 Weblog system Schoolmate v1.5.4 School administration software 40

Number of Patches Generated 41

Sanitization Patch Results 42

Time and Memory Performance of Differential Repair Algorithm 43