Equivalence of Extended Symbolic Finite Transducers Presented By: Loris D’Antoni Joint work with: Margus Veanes.

Slides:



Advertisements
Similar presentations
Chapter Three: Closure Properties for Regular Languages
Advertisements

CS 267: Automated Verification Lecture 8: Automata Theoretic Model Checking Instructor: Tevfik Bultan.
Lecture 24 MAS 714 Hartmut Klauck
Deterministic Finite Automata (DFA)
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Magic Numbers and Subset Construction Samik Datta Sayantan Mahinder.
Formal Language, chapter 9, slide 1Copyright © 2007 by Adam Webber Chapter Nine: Advanced Topics in Regular Languages.
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet: Languages.
February 6, 2015CS21 Lecture 141 CS21 Decidability and Tractability Lecture 14 February 6, 2015.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Introduction to Computability Theory
Lecture 3UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 3.
1 Languages and Finite Automata or how to talk to machines...
Finite state automaton (FSA)
Fall 2006Costas Busch - RPI1 Non-Deterministic Finite Automata.
Costas Busch - LSU1 Non-Deterministic Finite Automata.
Model Checking Lecture 5. Outline 1 Specifications: logic vs. automata, linear vs. branching, safety vs. liveness 2 Graph algorithms for model checking.
Great Theoretical Ideas in Computer Science.
STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1.
Static Analysis of String Encoders and Decoders Presented By: Loris D’Antoni Joint work with: Margus Veanes.
Regular Model Checking Ahmed Bouajjani,Benget Jonsson, Marcus Nillson and Tayssir Touili Moran Ben Tulila
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
FAST : a Transducer Based Language for Manipulating Trees Presented By: Loris D’Antoni Joint work with: Margus Veanes, Ben Livshits, David Molnar.
Finite State Machines Chapter 5. Languages and Machines.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.
Minimization of Symbolic Automata Presented By: Loris D’Antoni Joint work with: Margus Veanes 01/24/14, POPL14.
Advances in Automated Theorem Proving Leonardo de Moura, Nikolaj Bjørner Ken McMillan, Margus Veanes presented by Thomas Ball
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Lecture 1 Computation and Languages CS311 Fall 2012.
Athasit Surarerks THEORY OF COMPUTATION 07 NON-DETERMINISTIC FINITE AUTOMATA 1.
Lexical Analysis Constructing a Scanner from Regular Expressions.
Great Theoretical Ideas in Computer Science.
CSCI 2670 Introduction to Theory of Computing Instructor: Shelby Funk.
MA/CSSE 474 Theory of Computation Decision Problems DFSMs.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
INHERENT LIMITATIONS OF COMPUTER PROGAMS CSci 4011.
Regular Expressions Fundamental Data Structures and Algorithms Peter Lee March 13, 2003.
Transparency No. 4-1 Formal Language and Automata Theory Chapter 4 Patterns, Regular Expressions and Finite Automata (include lecture 7,8,9) Transparency.
Lecture Notes 
Great Theoretical Ideas in Computer Science for Some.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
1 Finite Model Theory Lecture 5 Turing Machines and Finite Models.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Finite State Machines Chapter 5. Languages and Machines.
Turing Machines Sections 17.6 – The Universal Turing Machine Problem: All our machines so far are hardwired. ENIAC
MA/CSSE 474 Theory of Computation Decision Problems, Continued DFSMs.
Theory of Computation Automata Theory Dr. Ayman Srour.
Minimization of Symbolic Transducers
Languages.
Non Deterministic Automata
CSE 105 theory of computation
Turing Machines Acceptors; Enumerators
COSC 3340: Introduction to Theory of Computation
CSE322 CONSTRUCTION OF FINITE AUTOMATA EQUIVALENT TO REGULAR EXPRESSION Lecture #9.
Non-Deterministic Finite Automata
Non Deterministic Automata
Chapter Nine: Advanced Topics in Regular Languages
Finite-State Methods in Natural-Language Processing: Basic Mathematics
CSE 105 theory of computation
More Undecidable Problems
CSE 105 theory of computation
Presentation transcript:

Equivalence of Extended Symbolic Finite Transducers Presented By: Loris D’Antoni Joint work with: Margus Veanes

Outline 1.Symbolic Automata and Transducers 2.Extended Symbolic Automata and Transducers – Some negative results – Some positive results 3.A friendlier restriction with decidable equivalence 2

Motivations Automata and Transducers are great!! Used in many applications (NLP, XML, program analysis, regex matching…) Can only handle finite alphabets Do not scale when the alphabet is very big (UTF16 has 2 16 elements) 3

Symbolic Finite Automata (SFA) [POPL12] 4 λx. x mod 2=0 λx. x mod 2=1 p q λx. x mod 2=0λx. x mod 2=1 Set of states Initial state Final states Symbolic transition function: labeled with a predicate

Symbolic Finite Automata (SFA) [POPL12] 5 λx. x mod 2=0 λx. x mod 2=1 p q λx. x mod 2 =0λx. x mod 2= ppqpp p is final  accept the input Execution Example

Symbolic Finite Transducers (SFT) [POPL12] 6 pq λx.x mod 2 = 0 / [λx.x+1, λx.x+2] Input guard = predicate (here int  bool) Output = sequence of functions from input theory to output theory (here int  int)

Symbolic Finite Transducers (SFT) [POPL12] 7 x mod 2 =0/[x, x] x mod 2 =1/[x-1] p q x mod 2 =0/[]x mod 2 =1/[x-1] 1253 ppqpp Input tape Output tape 02 42

Closure and Decidability Properties All closure properties and decidability results from classical automata theory still hold Alphabet theory is required to be – A Boolean algebra (closed under Boolean operations) – Decidable (we can check for satisfiability) Example: SFA intersection 8 x>5 q1p1q1p1 q2p2q2p2 x>5 ∧ x<10 q1q1 q2q2 x<10 p1p1 p2p2

Applications Analysis of.NET regular expressions (use the theory of bit-vectors for input alphabet) Automatic password generation Analysis of string sanitizers (BEK)

A limitation of Symbolic Transducers BASE64 encoder 3 Bytes 4 Base64 3 Bytes  4 Base64 characters Reading one input at a time will cause a blowup in the number of states! 10 Text contentMan Bytes Bit Pattern Index Base64 EncodedTWFu

Outline 1.Symbolic Automata and Transducers 2.Extended Symbolic Automata and Transducers – Some negative results – Some positive results 3.A friendlier restriction with decidable equivalence 11

Extended Symbolic Finite Automaton 12 x 1 >0 ∧ (x 2 <x 3 ) p Reads sequences of 3 consecutive symbols [x 1,x 2,x 3 ] Extended Symbolic Finite Transducers x 1 ≤FF ∧ x 2 ≤FF ∧ x 3 ≤FF / [x 1 >>2, ((x 1 &3) >4), ((x 2 &0xF) >6), x 3 &0x3F] p Each output symbol can be a function of all the 3 symbols 1783… pp x1x1 x2x2 x3x3 Man… pp TWFu… 3 3

A common misconception All the results in classical automata theory trivially extend to the symbolic setting…

A common misconception While for the previous models (SFAs, SFTs) most results extend to the symbolic setting…

In the finite case they do not add expressiveness In finite alphabet setting reading multiple input symbols at a time does not matter ab/[cde] 10 b/[cde]a/[] 2

ESFAs are more expressive than SFAs This is not true for the symbolic case x 1 >x 2 ?

Emptiness of ESFAs Intersection: UNDECIDABLE Given two ESFAs A and B, is there an input accepted by both A and B? The problem is undecidable: – Given a two counter machine M we construct two ESFAs A and B such that A ∩ B is empty iff M does not halt on any input 17

Proof that Emptiness of ESFA Intersection is undecidable (1/2) Machine M 1.Inc(a) 2.Dec(a) 3.Inc(b) 4.if(a=0) goto 3 else goto 5 5.Dec(b) 6.Halt 18 a010000… b000112… PC123434… Encode M’s run as following sequence 1. Inc(a)

Proof that Emptiness of ESFA Intersection is undecidable (1/2) Machine M 1.Inc(a) 2.Dec(a) 3.Inc(b) 4.if(a=0) goto 3, goto 5 5.Dec(b) 6.Halt 19 a010000… b000112… PC123434… 01 x 1.pc=1 ∧ x 2.pc=2 ∧ x 2.a=x 1.a+1 ∧ x 1.b=x 2.b V ……… V x 1.pc=4 ∧ x 2.pc=3 ∧ x 1.a=x 2.a ∧ x 1.a=0 ∧ x 1.b=x 2.b V x 1.pc=4 ∧ x 2.pc=5 ∧ x 1.a=x 2.a ∧ ¬x 1.a=0 ∧ x 1.b=x 2.b V …… x 1.pc= Intersection is empty if the two counter machine doesn’t halt We are only checking half of the configurations

Other Negative Results 20 Universality of ESFA is undecidable ESFA equivalence is undecidable ESFAs are not closed under intersection ESFAs are not closed under complement Nondeterministic ESFAs are strictly more expressive than deterministic ESFAs ESFTs equivalence is undecidable ESFTs are not closed under composition Symbolic automata are not so trivial after all

Some Positive Results 21 Emptiness (reachability) is decidable for both ESFAs and ESFTs Nondeterministic ESFAs are closed under union Not quite satisfactory, and very limited… – Can we do better?

Outline 1.Symbolic Automata and Transducers 2.Extended Symbolic Automata and Transducers – Some negative results – Some positive results 3.A friendlier restriction with decidable equivalence 22

A Simpler Model: Cartesian ESFAs and ESFTs 23 Most negative results use binary guards in predicate guards We can restrict the model to avoid this issue: Cartesian ESFAs and Cartesian ESFTs only allow guards to be conjunctions of unary predicates It can be decided if an ESFT (ESFA) is Cartesian q p x 1 =x 2 +1 qp x 1 >5 ; x 2 =1 / [x 1 +x 2, x 2, x 1 ]

Cartesian ESFA = SFA 24 Cartesian ESFAs are now equivalent (but more succinct) to SFAs 10 x 1 >5 ∧ x 2 =1 10 x=1x>5 2

Cartesian ESFTs > SFTs 25 Cartesian ESFTs are strictly more expressive than SFTs!! 10 x 1 >5 ∧ x 2 =1 / [x 1 +x 2, x 2, x 1 ] ?

Equivalence of Cartesian ESFTs Given two Cartesian ESFTs A and B, A is equivalent to B if – A and B have the same domain The domain of a Cartesian ESFT is a Cartesian ESFA (just drop outputs) Cartesian ESFAs are equivalent to SFAs Equivalence of SFAs is decidable [POPL12] – For every input in the intersection of the domains, A and B produce the same output (one-equality) …. 26

One-Equality of Cartesian ESFTs 27 q0q0 x 1 2 / [x 1 +x 2 ] 2 q1q1 p0p0 x 1 0, x 3 =1 / [x 1, x 2, x 3 ] 3 p1p1 q0p0q0p0 x 1 <5 ∧ x 1 <10 / [x 1 +x 2 ], [x 1,x 2,x 3 ] q t1 p t1 q 1 p t1 ?? ∧ x 3 =1 / ??, [ ] ?p1?p1 x 2 >2 ∧ x 2 >0 / [ ], [ ] q0p0q0p0 x 1 <5 ∧ x 1 <10 / [x 1 +x 2 ], [x 1, x 2 ] q 1 p t1 ?? ∧ x 3 =1 / ??, [x 3 ] ?p1?p1 Align inputs Align outputs

Result Summary A theoretical analysis of ESFAs and ESFTs A new model: Cartesian ESFAs and ESFTs (can model BASE 64) Clear line for decidability of equivalence: ESFTs vs Cartesian ESFTs This and other algorithms at (still in Beta) 28

Applications Analysis of string encoders: Proved correctness of BASE64, UTF8, etc. Succinct representation of regex pattern matching Fast code generation

Future Work Analysis of composition of ESFTs – Partially discussed in [VMCAI13] Use ESFAs to compute range of symbolic transducers – Range of SFT is not SFA but maybe is an ESFA? – Use range for synthesizing program inversion 30

Thank you Loris D’Antoni 31

Symbolic Finite Automaton (SFA) [POPL12] Classical acceptor modulo a rich alphabet – Alphabet is an effective Boolean Algebra Core Idea: represent labels with predicates – Separation of concerns: finite graph / algebra of labels Concrete transitions: p q Symbolic transition: a b … z q p x ≤ x ≤ 7A 16 bitvector predicate 32

Symbolic Finite Transducers Example Utf8 encoder – Input: valid utf16 encoded string – Output: equivalent utf8 encoded string For example utf8encode(“\uFF28\uFF29”) = “\xEF\xBC\xA8\xEF\xBC\xA9” 5 states & 11 transitions Equiv. classical transducer has 2 16 transitions Dagstuhl Seminar

Complete R utf8 34

One-Equality of Cartesian ESFTs 1.We incrementally build a product ESFT using a depth-first search 35 q0q0 x 1 2 / [x 1 +1, x 2 ] 2 q1q1 p0p0 x 1 0, x 3 =1 / [x 1, x 2, x 3 ] 2 p1p1 q0p0q0p0 x 1 <5 ∧ x 1 <10 / [x 1 +1, x 2 ], [x 1,x 2,x 3 ] q t1 p t1 q 1 p t1 ?? ∧ x 3 =1 / _, _ ?p1?p1 x 2 >2 ∧ x 2 >0 / _,_ q0p0q0p0 x 1 <5 ∧ x 1 <10 / [x 1 +1], [x 1 ] q t1 p t1 q 1 p t1 ?? ∧ x 3 =1 / ??, [x 3 ] ?p1?p1 x 2 >2 ∧ x 2 >0 / [x 2 ], [x 2 ] Found inequivalence Continue with every possible state Try aligning Build early product

One-Equality of Cartesian ESFTs Case with predicates that can’t be completely shifted 36 q0q0 x 1 2 / [x 1 +x 2 ] 2 q1q1 p0p0 x 1 0, x 3 =1 / [x 1, x 2, x 3 ] 2 p1p1 q0p0q0p0 x 1 <5 ∧ x 1 <10 / [x 1 +x 2 ], [x 1 ] q t1 p t1 q 1 p t1 ?? ∧ x 3 =1 / ??, [x 3 ] ?p1?p1 x 2 >2 ∧ x 2 >0 / [ ], [x 2 ] q0p0q0p0 x 1 <5 ∧ x 1 <10 / [x 1 +x 2 ], [x 1, x 2 ] q 1 p t1 ?? ∧ x 3 =1 / ??, [x 3 ] ?p1?p1

One-Equality of Cartesian ESFTs Case with predicates that can’t be shifted at all 37 q0q0 x 1 2 / [x 1 +x 2 ] 2 q1q1 p0p0 x 1 0, x 3 =1 / [x 1, x 2 +x 3 ] 2 p1p1 q0p0q0p0 x 1 <5 ∧ x 1 <10 / [x 1 +x 2 ], [x 1 ] q t1 p t1 q 1 p t1 ?? ∧ x 3 =1 / ??, [] ?p1?p1 x 2 >2 ∧ x 2 >0 / [ ], [x 2 +x 3 ] Alignment not possible! Easy to generate witness for inequivalence in this case