Automata Based String Analysis for Vulnerability Detection

Slides:



Advertisements
Similar presentations
Hossain Shahriar Mohammad Zulkernine. One of the worst vulnerabilities in web applications It involves the generation of dynamic HTML contents with invalidated.
Advertisements

Non-Deterministic Finite Automata
Lecture 24 MAS 714 Hartmut Klauck
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Pushdown Systems Koushik Sen EECS, UC Berkeley Slide Source: Sanjit A. Seshia.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Languages. A Language is set of finite length strings on the symbol set i.e. a subset of (a b c a c d f g g g) At this point, we don’t care how the language.
Fall 2006Costas Busch - RPI1 Deterministic Finite Automata And Regular Languages.
1 Languages and Finite Automata or how to talk to machines...
A String Constraint Solver for Detecting Web Application Vulnerability Xiang Fu Hofstra University Chung-Chih Li Illinois State University 07/03/2010SEKES.
Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,
Topics Automata Theory Grammars and Languages Complexities
Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
CS 290C: Formal Models for Web Software Lectures 17: Analyzing Input Validation and Sanitization in Web Applications Instructor: Tevfik Bultan.
Regular Model Checking Ahmed Bouajjani,Benget Jonsson, Marcus Nillson and Tayssir Touili Moran Ben Tulila
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
FSE 2014 Tutorial String Analysis Tevfik Bultan University of California, Santa Barbara, USA Fang Yu National Chengchi University, Taiwan.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Brian Mitchell - Drexel University MCS680-FCS 1 Patterns, Automata & Regular Expressions int MSTWeight(int graph[][], int size)
Differential String Analysis Tevfik Bultan (Joint work with Muath Alkhalaf, Fang Yu and Abdulbaki Aydin) 1 Verification Lab Department.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Lecture 2 Overview Topics What I forgot from last lecture Proof techniques continued Alphabets, strings, languages Automata June 2, 2015 CSCE 355 Foundations.
CS 267: Automated Verification Lectures 16: String Analysis Instructor: Tevfik Bultan.
Automatic Repair for Input Validation and Sanitization Bugs 1.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Given a string manipulating program, string analysis determines all possible values that a string expression can take during any program execution Using.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
Relational String Verification Using Multi-track Automata.
Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications Davide Balzarotti, Marco Cova, Vika Felmetsger, Nenad Jovanovic,
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
  An alphabet is any finite set of symbols.  Examples: ASCII, Unicode, {0,1} ( binary alphabet ), {a,b,c}. Alphabets.
Lecture #2 Advanced Theory of Computation. Languages & Grammar Before discussing languages & grammar let us deal with some related issues. Alphabet: is.
Costas Busch - LSU1 Deterministic Finite Automata And Regular Languages.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Theory of Computation Automata Theory Dr. Ayman Srour.
CIS 262 Automata, Computability, and Complexity Fall Instructor: Aaron Roth
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Alphabet, String, Language. 2 Alphabet and Strings An alphabet is a finite, non-empty set of symbols. –Denoted by  –{ 0, 1 } is a binary alphabet. –{
CS314 – Section 5 Recitation 2
Formal Methods in software development
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Lexical analysis Finite Automata
String Analysis for Dependable Input Validation
Natural Language Processing - Formal Language -
Theory of Computation Lecture # 9-10.
Two issues in lexical analysis
Arithmetic Constraints and Automata
REGULAR LANGUAGES AND REGULAR GRAMMARS
String Analysis Tevfik Bultan Verification Lab
Deterministic Finite Automata And Regular Languages Prof. Busch - LSU.
Review: Compiler Phases:
4. Properties of Regular Languages
Non-Deterministic Finite Automata
NFAs and Transition Graphs
4b Lexical analysis Finite Automata
Widening Automata.
Compiler Construction
Instructor: Aaron Roth
Instructor: Aaron Roth
CSCI 2670 Introduction to Theory of Computing
Instructor: Aaron Roth
CSCE 355 Foundations of Computation
Presentation transcript:

Automata Based String Analysis for Vulnerability Detection

Computer Trouble at School

SQL Injection A PHP example Access students’ data by $name (from a user input). 1:<?php 2: $name = $GET[”name”]; 3: $user data = $db->query(“SELECT * FROM students WHERE name = ‘$name’”); 4:?>

SQL Injection A PHP Example: Access students’ data by $name (from a user input). 1:<?php 2: $name = $GET[”name”]; 3: $user data = $db->query(“SELECT * FROM students WHERE name = ‘Robert ’); DROP TABLE students; - -”); 4:?>

What is a String? Given alphabet Σ, a string is a finite sequence of alphabet symbols <c1, c2, …, cn> for all i, ci is a character from Σ Σ = English = {a,…,z, A,…Z} Σ = {a} Σ = {a, b}, Σ = ASCII = {NULL, …, !, “, …, 0, …, 9, …, a, …, z, …} We only consider Σ = ASCII (can be extended) “Foo” “Ldkh#$klj54” “123” Σ = ASCII Σ = English Σ = {a} Σ = {a,b} “Hello” “Welcome” “good” “a” “aa” “aaa” “aaaa” “aaaaa” “a” “aba” “bbb” “ababaa” “aaa”

String Manipulation Operations Concatenation “1” + “2”  “12” “Foo” + “bAaR”  “FoobAaR” Replacement replace(“a”, “A”) replace (“2”,””) toUpperCase bAaR  bAAR 234  34 abC  ABC

String Filtering Operations Branch conditions length < 4 ? “Foo” “bAaR” match(/^[0-9]+$/) ? “234” “a3v%6” substring(2, 4) == “aR” ? ”bAaR”

A Simple Example Another PHP Example: <script ... 2: $www = $_GET[”www”]; 3: $l_otherinfo = ”URL”; 4: echo ”<td>” . $l_otherinfo . ”: ” . $www . ”</td>”; 5:?> The echo statement in line 4 is a sensitive function It contains a Cross Site Scripting (XSS) vulnerability <script ... 8

Is It Vulnerable? A simple taint analysis can report this segment vulnerable using taint propagation 1:<?php 2: $www = $_GET[”www”]; 3: $l_otherinfo = ”URL”; 4: echo ”<td>” . $l_otherinfo . ”: ” .$www. ”</td>”; 5:?> echo is tainted → script is vulnerable tainted 9

How to Fix it? To fix the vulnerability we added a sanitization routine at line s Taint analysis will assume that $www is untainted and report that the segment is NOT vulnerable 1:<?php 2: $www = $_GET[”www”]; 3: $l_otherinfo = ”URL”; s: $www = ereg_replace(”[^A-Za-z0-9 .-@://]”,””,$www); 4: echo ”<td>” . $l_otherinfo . ”: ” .$www. ”</td>”; 5:?> tainted untainted 10

Is It Really Sanitized? <script …> <script …> 1:<?php 2: $www = $_GET[”www”]; 3: $l_otherinfo = ”URL”; s: $www = ereg_replace(”[^A-Za-z0-9 .-@://]”,””,$www); 4: echo ”<td>” . $l_otherinfo . ”: ” .$www. ”</td>”; 5:?> <script …> <script …> 11

Sanitization Routines can be Erroneous The sanitization statement is not correct! ereg_replace(”[^A-Za-z0-9 .-@://]”,””,$www); Removes all characters that are not in { A-Za-z0-9 .-@:/ } .-@ denotes all characters between “.” and “@” (including “<” and “>”) “.-@” should be “.\-@” This example is from a buggy sanitization routine used in MyEasyMarket-4.1 (line 218 in file trans.php) 12

String Analysis String analysis determines all possible values that a string expression can take during any program execution Using string analysis we can identify all possible input values of the sensitive functions Then we can check if inputs of sensitive functions can contain attack strings How can we characterize attack strings? Use regular expressions to specify the attack patterns Attack pattern for XSS: Σ∗<scriptΣ∗ 13

Vulnerabilities Can Be Tricky Input <!sc+rip!t ...> does not match the attack pattern but it matches the vulnerability signature and it can cause an attack 1:<?php 2: $www = $_GET[”www”]; 3: $l_otherinfo = ”URL”; s: $www = ereg_replace(”[^A-Za-z0-9 .-@://]”,””,$www); 4: echo ”<td>” . $l_otherinfo . ”: ” .$www. ”</td>”; 5:?> <!sc+rip!t …> <script …> 14

Automata-based String Analysis Finite State Automata can be used to characterize sets of string values Automata based string analysis Associate each string expression in the program with an automaton The automaton accepts an over approximation of all possible values that the string expression can take during program execution Using this automata representation we symbolically execute the program, only paying attention to string manipulation operations 15

Dependency Graphs Extract dependency graphs from sanitizer functions 1:<?php 2: $www = $ GET[”www”]; 3: $l_otherinfo = ”URL”; 4: $www = ereg_replace( ”[^A-Za-z0-9 .-@://]”,””,$www ); 5: echo $l_otherinfo . ”: ” .$www; 6:?> $_GET[www], 2 “URL”, 3 [^A-Za-z0-9 .-@://], 4 “”, 4 $www, 2 $l_otherinfo, 3 “: “, 5 preg_replace, 4 str_concat, 5 $www, 4 str_concat, 5 echo, 5 Dependency Graph 16

Forward Analysis Using the dependency graph conduct vulnerability analysis Automata-based forward symbolic analysis that identifies the possible values of each node Each node in the dependency graph is associated with a DFA DFA accepts an over-approximation of the strings values that the string expression represented by that node can take at runtime The DFAs for the input nodes accept Σ∗ Intersecting the DFA for the sink nodes with the DFA for the attack pattern identifies the vulnerabilities 17

Forward Analysis Need to implement post-image computations for string operations: postConcat(M1, M2) returns M, where M=M1.M2 postReplace(M1, M2, M3) returns M, where M=replace(M1, M2, M3) Need to handle many specialized string operations: regmatch, substring, indexof, length, contains, trim, addslashes, htmlspecialchars, mysql_real_escape_string, tolower, toupper 18

L(URL: [A-Za-z0-9 .-;=-@/]*<[A-Za-z0-9 .-@/]*) Forward Analysis Forward = Σ* Attack Pattern = Σ*<Σ* $_GET[www], 2 [^A-Za-z0-9 .-@://], 4 “”, 4 “URL”, 3 $www, 2 Forward = URL Forward = [^A-Za-z0-9 .-@/] Forward = ε Forward = Σ* “: “, 5 preg_replace, 4 $l_otherinfo, 3 Forward = : Forward = URL Forward = [A-Za-z0-9 .-@/]* str_concat, 5 $www, 4 Forward = URL: Forward = [A-Za-z0-9 .-@/]* str_concat, 5 Forward = URL: [A-Za-z0-9 .-@/]* echo, 5 ∩ L(Σ*<Σ*) Forward = URL: [A-Za-z0-9 .-@/]* L(URL: [A-Za-z0-9 .-@/]*) = L(URL: [A-Za-z0-9 .-;=-@/]*<[A-Za-z0-9 .-@/]*) ≠ Ø 19

URL: [A-Za-z0-9 .-;=-@/]*<[A-Za-z0-9 .-@/]* Result Automaton U R L : [A-Za-z0-9 .-;=-@/] [A-Za-z0-9 .-@/] Space < URL: [A-Za-z0-9 .-;=-@/]*<[A-Za-z0-9 .-@/]* 20

Symbolic Automata Representation MONA DFA Package for automata manipulation [Klarlund and Møller, 2001] Compact Representation: Canonical form and Shared BDD nodes Efficient MBDD Manipulations: Union, Intersection, and Emptiness Checking Projection and Minimization Cannot Handle Nondeterminism: Use dummy bits to encode nondeterminism 21

Symbolic Automata Representation Symbolic DFA representation Explicit DFA representation 22

Backward Analysis A vulnerability signature is a characterization of all malicious inputs that can be used to generate attack strings Identify vulnerability signatures using an automata-based backward symbolic analysis starting from the sink node Need to implement Pre-image computations on string operations: preConcatPrefix(M, M2) returns M1 and where M = M1.M2 preConcatSuffix(M, M1) returns M2, where M = M1.M2 preReplace(M, M2, M3) returns M1, where M=replace(M1, M2, M3) 23

Vulnerability Signature = [^<]*<Σ* Backward Analysis Forward = Σ* Backward = [^<]*<Σ* $_GET[www], 2 node 3 node 6 “URL”, 3 [^A-Za-z0-9 .-@://], 4 “”, 4 $www, 2 Forward = URL Forward = [^A-Za-z0-9 .-@/] Forward = ε Forward = Σ* Backward = Do not care Backward = Do not care Backward = Do not care Backward = [^<]*<Σ* Vulnerability Signature = [^<]*<Σ* “: “, 5 preg_replace, 4 $l_otherinfo, 3 Forward = : Forward = [A-Za-z0-9 .-@/]* Forward = URL Backward = Do not care Backward = [A-Za-z0-9 .-;=-@/]*<[A-Za-z0-9 .-@/]* Backward = Do not care node 10 str_concat, 5 $www, 4 Forward = URL: Forward = [A-Za-z0-9 .-@/]* node 11 Backward = Do not care Backward = [A-Za-z0-9 .-;=-@/]*<[A-Za-z0-9 .-@/]* str_concat, 5 Forward = URL: [A-Za-z0-9 .-@/]* Backward = URL: [A-Za-z0-9 .-;=-@/]*<[A-Za-z0-9 .-@/]* node 12 echo, 5 Forward = URL: [A-Za-z0-9 .-@/]* Backward = URL: [A-Za-z0-9 .-;=-@/]*<[A-Za-z0-9 .-@/]* 24

Vulnerability Signature Automaton Σ < [^<] Non-ASCII [^<]*<Σ* 25

Widening String verification problem is undecidable The forward fixpoint computation is not guaranteed to converge in the presence of loops and recursion Compute a sound approximation During fixpoint compute an over approximation of the least fixpoint that corresponds to the reachable states Use an automata based widening operation to over-approximate the fixpoint Widening operation over-approximates the union operations and accelerates the convergence of the fixpoint computation 26

Widening Given a loop such as 1:<?php 2: $var = “head”; 3: while (. . .){ 4: $var = $var . “tail”; 5: } 6: echo $var 7:?> Our forward analysis with widening would compute that the value of the variable $var in line 6 is (head)(tail)* 27

Recap Given an automata-based string analyzer, Vulnerability Analysis: We can do a forward analysis to detect all the strings that reach the sink and that match the attack pattern We can compute an automaton that accepts all such strings If there is any such string the application might be vulnerable to the type of attack specified by the attack pattern Vulnerability Signature: We can do a backward analysis to compute the vulnerability signature Vulnerability signature is the set of all input strings that can generate a string value at the sink that matches the attack pattern

Forward Analysis Results The dependency graphs of these benchmarks are simplified based on the sinks Unrelated parts are removed using slicing Input Results #nodes #edges #sinks #inputs Time(s) Mem (kb) #states/#bdds 21 20 1 0.08 2599 23/219 29 0.53 13633 48/495 25 2 0.12 1955 125/1200 23 22 4022 133/1222 3387 29

Backward Analysis Results We use the backward analysis to generate the vulnerability signatures Backward analysis starts from the vulnerable sinks identified during forward analysis Input Results #nodes #edges #sinks #inputs Time(s) Mem (kb) #states/#bdds 21 20 1 0.46 2963 9/199 29 41.03 1859767 811/8389 25 2 2.35 5673 20/302, 20/302 23 22 2.33 32035 91/1127 5.02 14958 30