Xiang Fu Hofstra University Chung-Chih Li Illinois State University 04/13/20101NFM 2010
Background Hacker Server malicious scripts Cool page! 04/13/2010NFM Problem? SufficientText Inputs Lack of Sufficient Sanitation of Text Inputs
One Typical Error 1 <?php 2 $msg = $_POST[msg]; 3 $sanitized = pregreplace( 4/\.*?\ / i, 5, 6$msg ) ; 7 savetodb($sanitized ) 8 ?> 04/13/20103NFM 2010 script>alert(a) Attackers Input alert(a) Reluctant Kleene Star
Bigger Picture Objective: Automatic Discovery of Vulnerabilities 04/13/20104NFM 2010 Symbolic Execution Test Replayer Bytecode Attack Pattern String Constraint Solver SUSHI
Our Contribution Atomic Replacement Constraints Consider Two Semantics Greedy Reluctant Modeling Using Finite State Transducer (FST) Compact Representation of FST Security Analysis 04/13/2010NFM 20105
Finite State Transducer Accepts Regular Relation Union, Concat, Composition Intersection, Complement Used for Modeling Rewriting Rules [Kaplan94, Karttunen96] 04/13/2010NFM ε: a:2 b:3 A (ab,123) L(A)
Hierarchical FST & Modeling Declarative Semantics 04/13/2010NFM Id(* - * r *)r : ω ε:εε:ε Id(* - * r *) Identical Relation Any String not Containing patter r Goal: Regular Search Pattern Replacement
Modeling Reluctant Semantics 2 Steps Mark the beginning of pattern Do the replacement 04/13/2010NFM Goal: Key: Left-Most Matching
04/13/2010NFM a a b b c d a b c a b d Input Word a + b + c x Search Pattern #: ε reluc(r) # : ω ε: ε Id() f1f1 s1s1 s2s2 Begin Marker # a # a b b c d # a b c a b d x d x a b d
The Challenge: Begin Marker 04/13/2010NFM a a b b c d a b c a b d Input Word ### a + b + c x Search Pattern # Look-ahead Capability? Non-determinism 3 Steps: (1)End marker (2)Generic end marker (3)Begin marker
Preliminary End Marker 04/13/2010NFM c: c b: b a: a ε:$ b : b a: a A1A1 a + b + c x Search Pattern Idea: Start with End Marker for Reverse of Search Pattern Problem: Input tape accepts cb + a + only! Reversed Pattern cb + a +
Generic End Marker 04/13/2010NFM ,1 3 3,1 4 4,1 5 5,1 c:cb:ba:aε:$ b:b a:a c:c a:a b:b c:cb:b A2A2 cb + a + Pattern c c b a a Input Word c c b a $ a $ Output Word Deterministic! a:a
Finally, the Begin Marker 04/13/2010NFM a + b + c x Search Pattern ,1 3 3,1 4 4,1 5 5,1 c:c b:ba:aε:# b:b a:a c:c a:a b:b c:cb:b A3A3 0 ε:εε:ε ε:εε:ε ε:εε:ε
04/13/2010NFM a a b b c d a b c a b d Input Word a + b + c x Search Pattern #: ε reluc(r) # : ω ε: ε Id() f1f1 s1s1 s2s2 Begin Marker # a # a b b c d # a b c a b d x d x a b d
Greedy Semantics 04/13/2010NFM Goal: greedy Challenge: Look-ahead longest match
04/13/2010NFM Step 1: Begin Marker Step 2: ND End Marker Step 3: Pairing Markers Step 4: Checking Match Step 5: Check Longest Step 6: Replacement a + x Search Pattern aabab #a#ab#ab #a#a$b#ab #a$#a$b#a$b #a#a$b#a$b #aa$b#a$b xbxb #a#ab#a$b #aaba$b
Applications Solve String Constraints 04/13/2010NFM Login Servlet Input: user name After filtering single quote and length restriction
Solving Atomic Constraint 04/13/2010NFM Goal: A1Id(P) Project to Input Tape Solution
SUSHI Constraint Solver Solves Simple Linear String Constraints (SISE) Relies on dk.brics.automaton for FSA operations Self-made Java package for FST operations Supports 16-bit Unicode Compact Transition Representation 04/13/2010NFM
Efficiency of Solver 04/13/2010NFM Benchmark Equations Login Servlet 1.4 Seconds on 2Ghz PC Flex SDK XSS Attack Equation Size: Seconds Shorter than Security Track #
Related Work Forward String Analysis Christensen & Møller [SAS03] Wasserman & Su [PLDI07, ICSE08] Bjørner & Tillmann [TACAS09] Backward String Analysis Kiezun & Ganesh [ISSTA09] Yu & Bultan [SPIN08, ASE09] Fu [COMPSAC07, TAVWEB08] Natural Language Processing * Kaplan and Kay [CL1994] 04/13/2010NFM Our Contribution: Precise Modeling of Various Regular Substitution Semantics
Limitations SISE String Constraints All Variables Appear on LHS (Once) No Easy Solution for Equation System Yet No string length Future Directions Encoding string length in automata Finite model on bit-vector 04/13/2010NFM
Questions? 04/13/2010NFM