Equivalence of Extended Symbolic Finite Transducers Presented By: Loris D’Antoni Joint work with: Margus Veanes
Outline 1.Symbolic Automata and Transducers 2.Extended Symbolic Automata and Transducers – Some negative results – Some positive results 3.A friendlier restriction with decidable equivalence 2
Motivations Automata and Transducers are great!! Used in many applications (NLP, XML, program analysis, regex matching…) Can only handle finite alphabets Do not scale when the alphabet is very big (UTF16 has 2 16 elements) 3
Symbolic Finite Automata (SFA) [POPL12] 4 λx. x mod 2=0 λx. x mod 2=1 p q λx. x mod 2=0λx. x mod 2=1 Set of states Initial state Final states Symbolic transition function: labeled with a predicate
Symbolic Finite Automata (SFA) [POPL12] 5 λx. x mod 2=0 λx. x mod 2=1 p q λx. x mod 2 =0λx. x mod 2= ppqpp p is final accept the input Execution Example
Symbolic Finite Transducers (SFT) [POPL12] 6 pq λx.x mod 2 = 0 / [λx.x+1, λx.x+2] Input guard = predicate (here int bool) Output = sequence of functions from input theory to output theory (here int int)
Symbolic Finite Transducers (SFT) [POPL12] 7 x mod 2 =0/[x, x] x mod 2 =1/[x-1] p q x mod 2 =0/[]x mod 2 =1/[x-1] 1253 ppqpp Input tape Output tape 02 42
Closure and Decidability Properties All closure properties and decidability results from classical automata theory still hold Alphabet theory is required to be – A Boolean algebra (closed under Boolean operations) – Decidable (we can check for satisfiability) Example: SFA intersection 8 x>5 q1p1q1p1 q2p2q2p2 x>5 ∧ x<10 q1q1 q2q2 x<10 p1p1 p2p2
Applications Analysis of.NET regular expressions (use the theory of bit-vectors for input alphabet) Automatic password generation Analysis of string sanitizers (BEK)
A limitation of Symbolic Transducers BASE64 encoder 3 Bytes 4 Base64 3 Bytes 4 Base64 characters Reading one input at a time will cause a blowup in the number of states! 10 Text contentMan Bytes Bit Pattern Index Base64 EncodedTWFu
Outline 1.Symbolic Automata and Transducers 2.Extended Symbolic Automata and Transducers – Some negative results – Some positive results 3.A friendlier restriction with decidable equivalence 11
Extended Symbolic Finite Automaton 12 x 1 >0 ∧ (x 2 <x 3 ) p Reads sequences of 3 consecutive symbols [x 1,x 2,x 3 ] Extended Symbolic Finite Transducers x 1 ≤FF ∧ x 2 ≤FF ∧ x 3 ≤FF / [x 1 >>2, ((x 1 &3) >4), ((x 2 &0xF) >6), x 3 &0x3F] p Each output symbol can be a function of all the 3 symbols 1783… pp x1x1 x2x2 x3x3 Man… pp TWFu… 3 3
A common misconception All the results in classical automata theory trivially extend to the symbolic setting…
A common misconception While for the previous models (SFAs, SFTs) most results extend to the symbolic setting…
In the finite case they do not add expressiveness In finite alphabet setting reading multiple input symbols at a time does not matter ab/[cde] 10 b/[cde]a/[] 2
ESFAs are more expressive than SFAs This is not true for the symbolic case x 1 >x 2 ?
Emptiness of ESFAs Intersection: UNDECIDABLE Given two ESFAs A and B, is there an input accepted by both A and B? The problem is undecidable: – Given a two counter machine M we construct two ESFAs A and B such that A ∩ B is empty iff M does not halt on any input 17
Proof that Emptiness of ESFA Intersection is undecidable (1/2) Machine M 1.Inc(a) 2.Dec(a) 3.Inc(b) 4.if(a=0) goto 3 else goto 5 5.Dec(b) 6.Halt 18 a010000… b000112… PC123434… Encode M’s run as following sequence 1. Inc(a)
Proof that Emptiness of ESFA Intersection is undecidable (1/2) Machine M 1.Inc(a) 2.Dec(a) 3.Inc(b) 4.if(a=0) goto 3, goto 5 5.Dec(b) 6.Halt 19 a010000… b000112… PC123434… 01 x 1.pc=1 ∧ x 2.pc=2 ∧ x 2.a=x 1.a+1 ∧ x 1.b=x 2.b V ……… V x 1.pc=4 ∧ x 2.pc=3 ∧ x 1.a=x 2.a ∧ x 1.a=0 ∧ x 1.b=x 2.b V x 1.pc=4 ∧ x 2.pc=5 ∧ x 1.a=x 2.a ∧ ¬x 1.a=0 ∧ x 1.b=x 2.b V …… x 1.pc= Intersection is empty if the two counter machine doesn’t halt We are only checking half of the configurations
Other Negative Results 20 Universality of ESFA is undecidable ESFA equivalence is undecidable ESFAs are not closed under intersection ESFAs are not closed under complement Nondeterministic ESFAs are strictly more expressive than deterministic ESFAs ESFTs equivalence is undecidable ESFTs are not closed under composition Symbolic automata are not so trivial after all
Some Positive Results 21 Emptiness (reachability) is decidable for both ESFAs and ESFTs Nondeterministic ESFAs are closed under union Not quite satisfactory, and very limited… – Can we do better?
Outline 1.Symbolic Automata and Transducers 2.Extended Symbolic Automata and Transducers – Some negative results – Some positive results 3.A friendlier restriction with decidable equivalence 22
A Simpler Model: Cartesian ESFAs and ESFTs 23 Most negative results use binary guards in predicate guards We can restrict the model to avoid this issue: Cartesian ESFAs and Cartesian ESFTs only allow guards to be conjunctions of unary predicates It can be decided if an ESFT (ESFA) is Cartesian q p x 1 =x 2 +1 qp x 1 >5 ; x 2 =1 / [x 1 +x 2, x 2, x 1 ]
Cartesian ESFA = SFA 24 Cartesian ESFAs are now equivalent (but more succinct) to SFAs 10 x 1 >5 ∧ x 2 =1 10 x=1x>5 2
Cartesian ESFTs > SFTs 25 Cartesian ESFTs are strictly more expressive than SFTs!! 10 x 1 >5 ∧ x 2 =1 / [x 1 +x 2, x 2, x 1 ] ?
Equivalence of Cartesian ESFTs Given two Cartesian ESFTs A and B, A is equivalent to B if – A and B have the same domain The domain of a Cartesian ESFT is a Cartesian ESFA (just drop outputs) Cartesian ESFAs are equivalent to SFAs Equivalence of SFAs is decidable [POPL12] – For every input in the intersection of the domains, A and B produce the same output (one-equality) …. 26
One-Equality of Cartesian ESFTs 27 q0q0 x 1 2 / [x 1 +x 2 ] 2 q1q1 p0p0 x 1 0, x 3 =1 / [x 1, x 2, x 3 ] 3 p1p1 q0p0q0p0 x 1 <5 ∧ x 1 <10 / [x 1 +x 2 ], [x 1,x 2,x 3 ] q t1 p t1 q 1 p t1 ?? ∧ x 3 =1 / ??, [ ] ?p1?p1 x 2 >2 ∧ x 2 >0 / [ ], [ ] q0p0q0p0 x 1 <5 ∧ x 1 <10 / [x 1 +x 2 ], [x 1, x 2 ] q 1 p t1 ?? ∧ x 3 =1 / ??, [x 3 ] ?p1?p1 Align inputs Align outputs
Result Summary A theoretical analysis of ESFAs and ESFTs A new model: Cartesian ESFAs and ESFTs (can model BASE 64) Clear line for decidability of equivalence: ESFTs vs Cartesian ESFTs This and other algorithms at (still in Beta) 28
Applications Analysis of string encoders: Proved correctness of BASE64, UTF8, etc. Succinct representation of regex pattern matching Fast code generation
Future Work Analysis of composition of ESFTs – Partially discussed in [VMCAI13] Use ESFAs to compute range of symbolic transducers – Range of SFT is not SFA but maybe is an ESFA? – Use range for synthesizing program inversion 30
Thank you Loris D’Antoni 31
Symbolic Finite Automaton (SFA) [POPL12] Classical acceptor modulo a rich alphabet – Alphabet is an effective Boolean Algebra Core Idea: represent labels with predicates – Separation of concerns: finite graph / algebra of labels Concrete transitions: p q Symbolic transition: a b … z q p x ≤ x ≤ 7A 16 bitvector predicate 32
Symbolic Finite Transducers Example Utf8 encoder – Input: valid utf16 encoded string – Output: equivalent utf8 encoded string For example utf8encode(“\uFF28\uFF29”) = “\xEF\xBC\xA8\xEF\xBC\xA9” 5 states & 11 transitions Equiv. classical transducer has 2 16 transitions Dagstuhl Seminar
Complete R utf8 34
One-Equality of Cartesian ESFTs 1.We incrementally build a product ESFT using a depth-first search 35 q0q0 x 1 2 / [x 1 +1, x 2 ] 2 q1q1 p0p0 x 1 0, x 3 =1 / [x 1, x 2, x 3 ] 2 p1p1 q0p0q0p0 x 1 <5 ∧ x 1 <10 / [x 1 +1, x 2 ], [x 1,x 2,x 3 ] q t1 p t1 q 1 p t1 ?? ∧ x 3 =1 / _, _ ?p1?p1 x 2 >2 ∧ x 2 >0 / _,_ q0p0q0p0 x 1 <5 ∧ x 1 <10 / [x 1 +1], [x 1 ] q t1 p t1 q 1 p t1 ?? ∧ x 3 =1 / ??, [x 3 ] ?p1?p1 x 2 >2 ∧ x 2 >0 / [x 2 ], [x 2 ] Found inequivalence Continue with every possible state Try aligning Build early product
One-Equality of Cartesian ESFTs Case with predicates that can’t be completely shifted 36 q0q0 x 1 2 / [x 1 +x 2 ] 2 q1q1 p0p0 x 1 0, x 3 =1 / [x 1, x 2, x 3 ] 2 p1p1 q0p0q0p0 x 1 <5 ∧ x 1 <10 / [x 1 +x 2 ], [x 1 ] q t1 p t1 q 1 p t1 ?? ∧ x 3 =1 / ??, [x 3 ] ?p1?p1 x 2 >2 ∧ x 2 >0 / [ ], [x 2 ] q0p0q0p0 x 1 <5 ∧ x 1 <10 / [x 1 +x 2 ], [x 1, x 2 ] q 1 p t1 ?? ∧ x 3 =1 / ??, [x 3 ] ?p1?p1
One-Equality of Cartesian ESFTs Case with predicates that can’t be shifted at all 37 q0q0 x 1 2 / [x 1 +x 2 ] 2 q1q1 p0p0 x 1 0, x 3 =1 / [x 1, x 2 +x 3 ] 2 p1p1 q0p0q0p0 x 1 <5 ∧ x 1 <10 / [x 1 +x 2 ], [x 1 ] q t1 p t1 q 1 p t1 ?? ∧ x 3 =1 / ??, [] ?p1?p1 x 2 >2 ∧ x 2 >0 / [ ], [x 2 +x 3 ] Alignment not possible! Easy to generate witness for inequivalence in this case