Xiang Fu Hofstra University Chung-Chih Li Illinois State University 04/13/20101NFM 2010.

Slides:



Advertisements
Similar presentations
CS2303-THEORY OF COMPUTATION Closure Properties of Regular Languages
Advertisements

Information Security of Embedded Systems : Design of Secure Systems Prof. Dr. Holger Schlingloff Institut für Informatik und Fraunhofer FIRST.
Summary Overview of Vireo Student Submission of ETDs
Presentation at IEEE AWSITC, June 4, Energy-Efficient Communications via Network Coding Jos Weber Delft University of Technology The Netherlands.
Open Days 2010 D. Gubbels Professionalization within the range of volunteer work New challenges for volunteering organizations - Ehrenamt professionalisieren!
January 12, 2010 Updated February 4, Starting in TEA will collect Teacher Class Assignments and Student Course Completion data at the.
January 12, 2010 Updated April 9, Starting in TEA will collect Teacher Class Assignments and Student Course Completion data at the classroom.
Automata Theory Part 1: Introduction & NFA November 2002.
Lexical Analysis Dragon Book: chapter 3.
® Microsoft Office 2010 Excel Tutorial 3: Working with Formulas and Functions.
Using A series of training presentations How to list your project September,
August 4, The following PEIMS reporting changes have been made to the PEIMS Collection in order to collect the Classroom Link information.
Linked Lists in C and C++ CS-2303, C-Term Linked Lists in C and C++ CS-2303 System Programming Concepts (Slides include materials from The C Programming.
Hash Tables and Constant Access Time CS-2303, C-Term Hash Tables and Constant Access Time CS-2303 System Programming Concepts (Slides include materials.
Tutorial 1 Creating a Database
ACOT Intro/Copyright Succeeding in Business with Microsoft Excel 2010: Chapter1.
® Microsoft Office 2010 Excel Tutorial 1: Getting Started with Excel.
User Working Group Yannis Ioannidis University of Athens, Greece DL.org All Working Groups Meeting, Rome, May 2010.
Collaboration Works! 10/20/20101 Planning Research Institutional Effectiveness.
Quick Training Guide New SpringerLink, August 2010.
Chapter 13 – Aggregate Planning
[Networking Hardwares] [Maninder Kaur]
Tutorial 8 Sharing, Integrating, and Analyzing Data
Language and Automata Theory
Superset Me—Not: Why the JPTS Is Sufficient if You Use Appropriate Layer Validation Alexander (“Sasha”) Schwarzman American Geophysical Union (AGU) JATS-Con.
Theory of Computation CS3102 – Spring 2014 A tale of computers, math, problem solving, life, love and tragic death Nathan Brunelle Department of Computer.
Regular Expressions and DFAs COP 3402 (Summer 2014)
Natural Language Processing - Formal Language - (formal) Language (formal) Grammar.
Deterministic Finite Automata (DFA)
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
Languages. A Language is set of finite length strings on the symbol set i.e. a subset of (a b c a c d f g g g) At this point, we don’t care how the language.
A String Constraint Solver for Detecting Web Application Vulnerability Xiang Fu Hofstra University Chung-Chih Li Illinois State University 07/03/2010SEKES.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
1 Regular Expressions/Languages Regular languages –Inductive definitions –Regular expressions syntax semantics Not covered in lecture.
Great Theoretical Ideas in Computer Science.
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 Computability Five lectures. Slides available from my web page There is some formality, but it is gentle,
Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.
1 Module 14 Regular languages –Inductive definitions –Regular expressions syntax semantics.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
Deterministic Finite Automata COMPSCI 102 Lecture 2.
Brian K. Strickland a ba Λ a aa b Λ -NFA for Regular Expression (aab)*(a + aba)*
Regular Grammars Reading: 3.3. What we know so far…  FSA = Regular Language  Regular Expression describes a Regular Language  Every Regular Language.
1 Section 13.1 Turing Machines A Turing machine (TM) is a simple computer that has an infinite amount of storage in the form of cells on an infinite tape.
1.2 Three Basic Concepts Languages start variables Grammars Let us see a grammar for English. Typically, we are told “a sentence can Consist.
1 Turing Machines and Equivalent Models Section 13.1 Turing Machines.
Great Theoretical Ideas in Computer Science for Some.
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Pushdown Automata.
Static Detection of Cross-Site Scripting Vulnerabilities
Theory of Languages and Automata
@#? Text Search g ~ A R B n f u j u q e ! 4 k ] { u "!"
Natural Language Processing - Formal Language -
Pushdown Automata PDAs
Automata Based String Analysis for Vulnerability Detection
Jaya Krishna, M.Tech, Assistant Professor
Chapter 7 Regular Grammars
CSE322 CONSTRUCTION OF FINITE AUTOMATA EQUIVALENT TO REGULAR EXPRESSION Lecture #9.
4. Properties of Regular Languages
CSC NLP - Regex, Finite State Automata
Systems of equations.
Great Theoretical Ideas in Computer Science
Building Finite-State Machines
Objectives Identify solutions of linear equations in two variables.
What is it? The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-acting". An automaton (Automata in plural) is an abstract self-propelled.
Presentation transcript:

Xiang Fu Hofstra University Chung-Chih Li Illinois State University 04/13/20101NFM 2010

Background Hacker Server malicious scripts Cool page! 04/13/2010NFM Problem? SufficientText Inputs Lack of Sufficient Sanitation of Text Inputs

One Typical Error 1 <?php 2 $msg = $_POST[msg]; 3 $sanitized = pregreplace( 4/\.*?\ / i, 5, 6$msg ) ; 7 savetodb($sanitized ) 8 ?> 04/13/20103NFM 2010 script>alert(a) Attackers Input alert(a) Reluctant Kleene Star

Bigger Picture Objective: Automatic Discovery of Vulnerabilities 04/13/20104NFM 2010 Symbolic Execution Test Replayer Bytecode Attack Pattern String Constraint Solver SUSHI

Our Contribution Atomic Replacement Constraints Consider Two Semantics Greedy Reluctant Modeling Using Finite State Transducer (FST) Compact Representation of FST Security Analysis 04/13/2010NFM 20105

Finite State Transducer Accepts Regular Relation Union, Concat, Composition Intersection, Complement Used for Modeling Rewriting Rules [Kaplan94, Karttunen96] 04/13/2010NFM ε: a:2 b:3 A (ab,123) L(A)

Hierarchical FST & Modeling Declarative Semantics 04/13/2010NFM Id(* - * r *)r : ω ε:εε:ε Id(* - * r *) Identical Relation Any String not Containing patter r Goal: Regular Search Pattern Replacement

Modeling Reluctant Semantics 2 Steps Mark the beginning of pattern Do the replacement 04/13/2010NFM Goal: Key: Left-Most Matching

04/13/2010NFM a a b b c d a b c a b d Input Word a + b + c x Search Pattern #: ε reluc(r) # : ω ε: ε Id() f1f1 s1s1 s2s2 Begin Marker # a # a b b c d # a b c a b d x d x a b d

The Challenge: Begin Marker 04/13/2010NFM a a b b c d a b c a b d Input Word ### a + b + c x Search Pattern # Look-ahead Capability? Non-determinism 3 Steps: (1)End marker (2)Generic end marker (3)Begin marker

Preliminary End Marker 04/13/2010NFM c: c b: b a: a ε:$ b : b a: a A1A1 a + b + c x Search Pattern Idea: Start with End Marker for Reverse of Search Pattern Problem: Input tape accepts cb + a + only! Reversed Pattern cb + a +

Generic End Marker 04/13/2010NFM ,1 3 3,1 4 4,1 5 5,1 c:cb:ba:aε:$ b:b a:a c:c a:a b:b c:cb:b A2A2 cb + a + Pattern c c b a a Input Word c c b a $ a $ Output Word Deterministic! a:a

Finally, the Begin Marker 04/13/2010NFM a + b + c x Search Pattern ,1 3 3,1 4 4,1 5 5,1 c:c b:ba:aε:# b:b a:a c:c a:a b:b c:cb:b A3A3 0 ε:εε:ε ε:εε:ε ε:εε:ε

04/13/2010NFM a a b b c d a b c a b d Input Word a + b + c x Search Pattern #: ε reluc(r) # : ω ε: ε Id() f1f1 s1s1 s2s2 Begin Marker # a # a b b c d # a b c a b d x d x a b d

Greedy Semantics 04/13/2010NFM Goal: greedy Challenge: Look-ahead longest match

04/13/2010NFM Step 1: Begin Marker Step 2: ND End Marker Step 3: Pairing Markers Step 4: Checking Match Step 5: Check Longest Step 6: Replacement a + x Search Pattern aabab #a#ab#ab #a#a$b#ab #a$#a$b#a$b #a#a$b#a$b #aa$b#a$b xbxb #a#ab#a$b #aaba$b

Applications Solve String Constraints 04/13/2010NFM Login Servlet Input: user name After filtering single quote and length restriction

Solving Atomic Constraint 04/13/2010NFM Goal: A1Id(P) Project to Input Tape Solution

SUSHI Constraint Solver Solves Simple Linear String Constraints (SISE) Relies on dk.brics.automaton for FSA operations Self-made Java package for FST operations Supports 16-bit Unicode Compact Transition Representation 04/13/2010NFM

Efficiency of Solver 04/13/2010NFM Benchmark Equations Login Servlet 1.4 Seconds on 2Ghz PC Flex SDK XSS Attack Equation Size: Seconds Shorter than Security Track #

Related Work Forward String Analysis Christensen & Møller [SAS03] Wasserman & Su [PLDI07, ICSE08] Bjørner & Tillmann [TACAS09] Backward String Analysis Kiezun & Ganesh [ISSTA09] Yu & Bultan [SPIN08, ASE09] Fu [COMPSAC07, TAVWEB08] Natural Language Processing * Kaplan and Kay [CL1994] 04/13/2010NFM Our Contribution: Precise Modeling of Various Regular Substitution Semantics

Limitations SISE String Constraints All Variables Appear on LHS (Once) No Easy Solution for Equation System Yet No string length Future Directions Encoding string length in automata Finite model on bit-vector 04/13/2010NFM

Questions? 04/13/2010NFM