Given a string manipulating program, string analysis determines all possible values that a string expression can take during any program execution Using.

Slides:



Advertisements
Similar presentations
CS2303-THEORY OF COMPUTATION Closure Properties of Regular Languages
Advertisements

Automata Theory Part 1: Introduction & NFA November 2002.
1 Lecture 32 Closure Properties for CFL’s –Kleene Closure construction examples proof of correctness –Others covered less thoroughly in lecture union,
Finite Automata CPSC 388 Ellen Walker Hiram College.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture2: Non Deterministic Finite Automata Prof. Amos Israeli.
Languages. A Language is set of finite length strings on the symbol set i.e. a subset of (a b c a c d f g g g) At this point, we don’t care how the language.
1 Introduction to Computability Theory Lecture2: Non Deterministic Finite Automata (cont.) Prof. Amos Israeli.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
1 Languages and Finite Automata or how to talk to machines...
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,
1 Module 31 Closure Properties for CFL’s –Kleene Closure construction examples proof of correctness –Others covered less thoroughly in lecture union, concatenation.
Normal forms for Context-Free Grammars
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
CS 290C: Formal Models for Web Software Lectures 17: Analyzing Input Validation and Sanitization in Web Applications Instructor: Tevfik Bultan.
Regular Model Checking Ahmed Bouajjani,Benget Jonsson, Marcus Nillson and Tayssir Touili Moran Ben Tulila
Finite-State Machines with No Output
DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.
Pushdown Automata (PDA) Intro
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 Chapter 1 Introduction to the Theory of Computation.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
Grammars CPSC 5135.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2006.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
Strings and Languages CS 130: Theory of Computation HMU textbook, Chapter 1 (Sec 1.5)
Copyright © Curt Hill Finite State Automata Again This Time No Output.
Discrete Mathematical Structures 4 th Edition Kolman, Busby, Ross © 2000 by Prentice-Hall, Inc. ISBN
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Lecture 2 Overview Topics What I forgot from last lecture Proof techniques continued Alphabets, strings, languages Automata June 2, 2015 CSCE 355 Foundations.
Quantified Data Automata on Skinny Trees: an Abstract Domain for Lists Pranav Garg 1, P. Madhusudan 1 and Gennaro Parlato 2 1 University of Illinois at.
CS 267: Automated Verification Lectures 16: String Analysis Instructor: Tevfik Bultan.
Strings and Languages Denning, Section 2.7. Alphabet An alphabet V is a finite nonempty set of symbols. Each symbol is a non- divisible or atomic object.
Transparency No. 2-1 Formal Language and Automata Theory Homework 2.
Automatic Repair for Input Validation and Sanitization Bugs 1.
Nondeterministic Finite Automata (NFAs). Reminder: Deterministic Finite Automata (DFA) q For every state q in Q and every character  in , one and only.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
Relational String Verification Using Multi-track Automata.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Exercise: (aa)* | (aaa)* Construct automaton and eliminate epsilons.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Lecture #5 Advanced Computation Theory Finite Automata.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Fixpoints and Reachability
String Analysis for Dependable Input Validation
Automata Based String Analysis for Vulnerability Detection
Jaya Krishna, M.Tech, Assistant Professor
Chapter 2 FINITE AUTOMATA.
CSE322 CONSTRUCTION OF FINITE AUTOMATA EQUIVALENT TO REGULAR EXPRESSION Lecture #9.
4. Properties of Regular Languages
Instructor: Aaron Roth
Chapter 1 Introduction to the Theory of Computation
COMPILER CONSTRUCTION
CSCE 355 Foundations of Computation
Presentation transcript:

Given a string manipulating program, string analysis determines all possible values that a string expression can take during any program execution Using string analysis we can verify properties of string manipulating programs For example, we can identify all possible input values of sensitive functions in a web application and then check whether inputs of sensitive functions can contain attack strings

 Configurations/Transitions are represented using word equations  Word equations are represented/approximated using (aligned) multi-track DFAs which are closed under intersection, union, complement and projection  Operations required for reachability analysis (such as equivalence checking) are computed on DFAs

 Let X (the first track), Y (the second track), be two string variables  λ : a padding symbol that appears only on the tail of each track (aligned)  A multi-track automaton that encodes X = Y.txt (a,a), (b,b) … (t, λ )(x, λ )(t, λ )

 Compute the post-conditions of statements Given a multi-track automata M and an assignment statement: X := sexp Post(M, X := sexp) denotes the post-condition of X := sexp with respect to M Post(M, X := sexp) = (  X, M ∩ CONSTRUCT(X’ = sexp, +))[X/X’]

 We implement a symbolic forward reachability computation using the post-condition operations  The forward fixpoint computation is not guaranteed to converge in the presence of loops and recursion  We use an automata based widening operation to over-approximate the fixpoint  Widening operation over-approximates the union operations and accelerates the convergence of the fixpoint computation

 The alphabet of an n-track automaton is Σ n  The size of multi-track automata could be huge during computations  On the other hand, we may carry more information than we need to verify the property  More Abstractions:  We propose alphabet abstraction to reduce Σ  We propose relation abstraction to reduce n

 Select a subset of alphabet characters (Σ’) to analyze distinctly and merge the remaining alphabet characters into a special symbol (  )  For example: Let Σ={<, a, b, c} and Σ’={<}, L(M) = a<b +, we have: α Σ,Σ’ (M) = M α and γ Σ,Σ’ (M α ) = M γ, where L(M α )=  <  +, and L(M γ ) = (a|b|c)<(a|b|c) +

 We use an alphabet transducer M Σ,Σ’ to construct abstract automata  α denotes any character in Σ’  β denotes any character in Σ\Σ’ (β,  ) (α,α) (λ,λ)

b a b < (a,  ), (b,  ), (c,  ) (<,<) (λ,λ) (b,*) (a,*) (b,*) (<,*)    < (b,  ) (a,  ) (b,  ) (<,<) α α M MαMα M Σ,Σ’

a,b,c < (a,  ), (b,  ), (c,  ) (<,<) (λ,λ) (a,  ), (b,  ), (c,  ) (<,<)    < (*,  ) (*,<) MγMγ MαMα M Σ,Σ’ (a,  ), (b,  ), (c,  ) (a,  ), (b,  ), (c,  ) γ γ

1:<?php 2: $www = $_GET[”www”]; 3: $l_otherinfo = ”URL”; 4: $www = str_replace(<,””,$www); 5: echo ” ”. $l_otherinfo. ”: ”. $www. ” ”; 6:?>  Consider the above example, choosing Σ’={<, s} (instead of all ASCII characters) is sufficient to conclude that the echo string does not contain any substring that matches “<script”

 Consider the following abstraction: We map all the symbols in the alphabet to a single symbol  The automaton we generate with this abstraction will be a unary automaton (an automaton with a unary alphabet)  The only information that this automaton will give us will be the length of the strings  So alphabet abstraction corresponds to length abstraction

 Select sets of string variables to analyze relationally (using multi-track automata), and analyze the rest independently (using single-track automata) For example, consider three string variables n 1, n 2, n 3.  Let χ={{n 1,n 2 }, n 3 } and χ’={{n 1 }, {n 2 }, {n 3 }}  Let M = {M 1,2, M 3 } that consists of a 2-track automaton for n 1 and n 2 and a single track automaton for n 3  We have α χ,χ’ (M) = M α γ χ,χ’ (M α ) = M γ, where

 M α = {M 1, M 2, M 3 } such that M 1 and M 2 are constructed by the projection of M 1,2 to the first track and the second track respectively  M Υ = {M’ 1,2, M 3 } such that M’ 1,2 is constructed by the intersection of M 1,* and M *,2, where  M 1,* is the two-track automaton extended from M1 with arbitrary values in the second track  M *,2 is the two-track automaton extended from M2 with arbitrary values in the first track

(a,a) (b,b) (c,c) a b c (a,*) (b,*) (c,*) (*,a) (*,b) (*,c) (a,a) (b,b) (c,c) (b,a) (a,b) M 1,2 M 1, M 2 M 1,* M *,2 M’ 1,2 α α γ γ

1:<?php 2: $usr = $_GET[“usr”]; 3: $passwd = $_GET[“passwd”]; 4: $key = $usr.$passwd; 5: if($key = “admin1234”) 6: echo $usr; 7:?>  Consider the above example, choosing χ’={{$usr, $key}, {$passwd}} is sufficient to identify the echo string is a prefix of “admin1234” and does not contain any substring that matches “<script”

 Both alphabet and relation abstractions form abstraction lattices, which allow different levels of abstractions  Combining these abstractions leads a product lattice, where each point is an abstraction class that corresponds to a particular alphabet abstraction and a relation abstraction  The top is a non relational analysis using unary alphabet  The bottom is a complete relational analysis using full alphabet

Some abstraction from the abstraction lattice and the corresponding analyses

 Select an abstraction class  Ideally, the choice should be as abstract as possible while remaining precise enough to prove the property in question  Heuristics  Let the property guide the choice  Collect constants and relations from assertions and their dependency graphs ▪ It forms the lower bound of the abstraction class ▪ Select an initial abstraction class, e.g., characters and relations appearing in assertions ▪ Refine the abstraction class toward the lower bound