Minimization of Symbolic Automata Presented By: Loris D’Antoni Joint work with: Margus Veanes 01/24/14, POPL14.

Slides:



Advertisements
Similar presentations
Lecture 24 MAS 714 Hartmut Klauck
Advertisements

Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Regular Expressions and DFAs COP 3402 (Summer 2014)
Equivalence of Extended Symbolic Finite Transducers Presented By: Loris D’Antoni Joint work with: Margus Veanes.
Automated Grading of DFA Constructions Rajeev Alur (Penn), Loris D’Antoni (Penn), Sumit Gulwani (MSR), Bjoern Hartmann (Berkeley), Dileep Kini (UIUC),
DFA Minimization Jeremy Mange CS 6800 Summer 2009.
Finite Automata CPSC 388 Ellen Walker Hiram College.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY (For next time: Read Chapter 1.3 of the book)
CS5371 Theory of Computation
Transparency No. 2-1 Formal Language and Automata Theory Chapter 2 Deterministic Finite Automata (DFA) (include Lecture 3 and 4)
Courtesy Costas Busch - RPI1 Non Deterministic Automata.
61 Nondeterminism and Nodeterministic Automata. 62 The computational machine models that we learned in the class are deterministic in the sense that the.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
From Cooper & Torczon1 Automating Scanner Construction RE  NFA ( Thompson’s construction )  Build an NFA for each term Combine them with  -moves NFA.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Cs466(Prasad)L14Equiv1 Equivalence of Regular Language Representations.
STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1.
Static Analysis of String Encoders and Decoders Presented By: Loris D’Antoni Joint work with: Margus Veanes.
Introduction Chapter 0. Three Central Areas 1.Automata 2.Computability 3.Complexity.
FAST : a Transducer Based Language for Manipulating Trees Presented By: Loris D’Antoni Joint work with: Margus Veanes, Ben Livshits, David Molnar.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.
Automatic Structures Bakhadyr Khoussainov Computer Science Department The University of Auckland, New Zealand.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
By: Er. Sukhwinder kaur.  What is Automata Theory? What is Automata Theory?  Alphabet and Strings Alphabet and Strings  Empty String Empty String 
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
Lexical Analysis Constructing a Scanner from Regular Expressions.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
MA/CSSE 474 Theory of Computation Decision Problems DFSMs.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
CS 321 Programming Languages and Compilers Lectures 16 & 17 Introduction to Formal Languages Regular Languages Lexical Analysis.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
CHAPTER 1 Regular Languages
Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Deterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.2)
Complexity and Computability Theory I Lecture #8 Instructor: Rina Zviel-Girshin Lea Epstein.
Lexical Analysis: DFA Minimization & Wrap Up. Automating Scanner Construction PREVIOUSLY RE  NFA ( Thompson’s construction ) Build an NFA for each term.
CS 203: Introduction to Formal Languages and Automata
Regular Expressions Fundamental Data Structures and Algorithms Peter Lee March 13, 2003.
The decidability of Presburger Arithmetic By Guillermo Guillen 04/13/05 Dr. Smith COT 6421 FIU Spring 2005.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Transparency No. 2-1 Formal Language and Automata Theory Homework 2.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Regular Languages Chapter 1 Giorgi Japaridze Theory of Computability.
Complexity and Computability Theory I Lecture #5 Rina Zviel-Girshin Leah Epstein Winter
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Department of Software & Media Technology
Formal Methods in software development
@#? Text Search g ~ A R B n f u j u q e ! 4 k ] { u "!"
Two issues in lexical analysis
4. Properties of Regular Languages
DFA Equivalence & Minimization
Formal Methods in software development
Instructor: Aaron Roth
CSCI 2670 Introduction to Theory of Computing
Mealy and Moore Machines
Lexical Analysis Uses formalism of Regular Languages
Presentation transcript:

Minimization of Symbolic Automata Presented By: Loris D’Antoni Joint work with: Margus Veanes 01/24/14, POPL14

What is automata minimization? 2

Deterministic Finite Automaton 3 a b q0q0 q a b A = (Q,q 0,F,δ,Σ)

Automata Minimization Minimization = find and collapse equivalent states 4 p q s s Non final Final distinguishable

a b a a,b b a b a b a b 01,32,46 5,6 a,b

A simple Application: Random Password generation Given constraints: Length is k: "^.{5,20}$" Contains 2 capital letters: "[A-Z].*[A-Z]" Contains a digit: "\d“ Generate random instances with uniform distribution that match all the above conditions. 6

Key idea ^.{5,20}$ [A-Z].*[A-Z] \d 7 ∩

Problems 8 Big automatonMinimization Big alphabet 2 16 characters in UTF16 Symbolic Automata

Symbolic Finite Automaton (SFA) 9 λx. x mod 2=0 λx. x mod 2=1 q0q0 q λx. x mod 2=0 λx. x mod 2=1 A = (Q,q 0,F,δ,σ) Input sort: in this case int Separate theory for the input alphabet SMT SOLVER

Symbolic Finite Automata (SFA) 10 λx. x mod 2=0 λx. x mod 2=1 p q λx. x mod 2 =0λx. x mod 2= ppqpp p is final  accept the input Execution Example

Advantages of Symbolic Automata Alphabet is represented symbolically – UTF16 abstracted using BDDs – Integer using predicates over integers Succinctness – at most n 2 transitions – One transition captures many symbols BUT: do DFA algorithms generalize to SFAs? 11

An example: SFA intersection 12 p1p1 q1q1 11 p2p2 q2q2 22 A1:A1: A2:A2: p1p2p1p2  1  2 A1A2:A1A2: q1q2q1q2 X delete when  1  2 unsatisfiable REQUIREMENTS: Input theory must be a Boolean algebra, and decidable

Moore’s algorithm 13 p q p’ q’ distinguishable a a n 2 iterations over k symbols O(kn 2 ) s s

Symbolic Moore’s algorithm Initially D = F x (Q\F) U (Q\F) x F for each (p’,q’) in D, (p,q) not in D let φ, ψ guards of δ (p,p’), δ (q,q’) if(isSat( φ ∧ ψ )) add (p,q) to D 14 p q p’ q’ distinguishable φ ψ φ ∧ ψ satisfiable m transitions O(m 2 f(k)) k = size of biggest predicate in SFA

Sometimes Moore is Less sec for 15 characters! the culprit should scale up to 128 characters!

Hopcroft’s algorithm: intuition 16 F Q\F

Hopcroft’s algorithm: intuition 17 a a a RA S

Hopcroft’s algorithm: intuition 18 P3P3 P2P2 P1P1 P4P4 R Keep partitioning with respect to W for every input symbol b b

Hopcroft’s algorithm: intuition 19 R Let’s assume I already split according to R P2P2 P1P1

Hopcroft’s algorithm: intuition 20 RQ Let’s assume I already split according to R P2P2 P1P1 Do I need to consider both P 1 and for P 2 future splitting?

Hopcroft’s algorithm: intuition 21 a a a RQ Let’s assume I already split according to R P2P2 P1P1 Do I need to consider both P 1 and for P 2 future splitting?

Hopcroft’s algorithm: intuition 22 a a a RQ Let’s assume I already split according to R P2P2 P1P1 Do I need to consider both P 1 and for P 2 future splitting?

Hopcroft’s algorithm: intuition 23 a a a RQ Let’s assume I already split according to R P2P2 P1P1 Do I need to consider both P 1 and for P 2 future splitting? NO I ONLY NEED ONE!

Hopcroft’s algorithm P := {F, Q\F} W := {if |F|< |Q\F| then F else Q\F} while W != { } R:=pickFrom(W) foreach a in Σ S := δ -1 (R,a) while ∃ T ∈ P. T ∩ S ≠ {} ∧ T \S ≠ {} P,W := split(P, P ∩ S, P\S) return partitioned DFA 24 log n iterations O(kn log n)

Hopcroft’s algorithm example a b a a,b P2P2 P1P1 b a b a b a b R PARTITION: {P 1, P 2 } TO ANALYZE: {P 2 }

Hopcroft’s algorithm example a b a a,b b a b a b a b R P2P2 P 11 P 12 PARTITION: {P 11, P 12, P 2 } TO ANALYZE: {P 2, P 12 }

Hopcroft’s algorithm example a b a a,b b a b a b a b R P2P2 P 11 P 12 PARTITION: {P 11, P 12, P 2 } TO ANALYZE: {P 12 }

Hopcroft’s algorithm example a b a a,b b a b a b a b 01,32,46 5,6 a,b

Symbolic Hopcroft’s algorithm P := {F, Q\F} W := {if |F|< |Q\F| then F else Q\F} while W != { } R:=pickFrom(W) foreach a in Σ S := δ -1 (R,a) while ∃ T ∈ P. T ∩ S ≠ {} ∧ T \S ≠ {} P,W := split(P, P ∩ S, P\S) return partitioned DFA 29 Alphabet might not be finite

Finitize the alphabet 30 φ1φ1 φ2φ2 φ3φ3 φ‘7φ‘7 φ'3φ'3 φ‘1φ‘1 φ‘4φ‘4 φ‘2φ‘2 φ‘5φ‘5 φ‘6φ‘6 φ‘8φ‘8 Predicates: {x>5, x<10, x=3} Minterms: {x=3, x≤5, 5<x<10, x≥10}

Symbolic Hopcroft’s algorithm P := {F, Q\F} W := {if |F|< |Q\F| then F else Q\F} while W ≠ {} R:=pickFrom(W) foreach φ in Minterms(A) S := δ -1 (R, φ ) while ∃ T ∈ P. T ∩ S ≠ {} ∧ T \S ≠ {} P,W := split(P, P ∩ S, P\S) return partitioned DFA 31 log n iterations O(2 m nlog n+2 m f(mk)) We need something better

New Algorithm: Intuition 32 Φ ψ A R P1P1 P2P2 p p q q What if Φ ≠ ψ? Φ\ψΦ\ψ

Example 1/ x<0 x≥0 -2<x<5 -5<x<3 -2<x<5 -5<x<3 true F Q\F false ≠ -5<x<3 R

Example 1/ x<0 x≥0 -2<x<5 -5<x<3 -2<x<5 -5<x<3 true R

Example 2/2 r 65 p q true x<2 x<5 x≥2 x≥5 Both p and q go to r, but… x≥2  x≥5 ?? NO Then p is distinguishable from q R

Example 2/2 r 65 p q true x<2 x<5 x≥2 x≥5 Both p and q go to r, but… x≥2  x≥5 ?? NO Then p is distinguishable from q R

New Algorithm P := {F, Q\F} W := {if |F|< |Q\F| then F else Q\F} while W ≠ { } R := pickFrom(W); S := δ -1 (R, true); while ∃ A ∈ P. A ∩ S ≠ {} ∧ ∃ p 1,p 2. δ -1 (p 1 ) ≠ δ -1 (p 2 ) P,W := split(P, P ∩ S, P\S, witness( δ -1 (p 1 ) ≠ δ -1 (p 2 ) ) return partitioned DFA 37 log n iterations O(n 2 log n f(nk))

Experiments 1.Randomly generated DFAs SFAs using BDDs (sort = bitvec 7 bits) 2.SFAs generated from regexes SFAs using BDDs (sort = bitvec 16 bits) 3.A corner case of Minterm generation SFAs using BDDs (sort = bitvec 20 bits) 4.Randomly generated SFAs over string x int SFAs over using Z3 (sort = string x int) 5.Monadic second order logic to DFA transformation SFAs using BDDs (sort = bitvec 40 bits)

1) Randomly generated DFAs 5 billion DFAs: 10 to 100 states, 2 to 50 symbols From [Almeida, Moreira, Reis, TR05]

2) SFAs generated from regexes (regexplib.com) 3000 regexes over UTF16 alphabet (2 16 elems) From [regexplib.com] Both axis logscale More States => Moore Worse

3) A corner case of Minterm generation This SFA has 2 k minterms!! brics.automata.dk Uses intervals instead of BDDs Logscale

4) Randomly generated SFAs over string x int Randomly generated 10 SFAs over string x int and minimized all the intersections, complement, difference, and union of such SFAs Random generation causes many predicate overlaps  minterms

5) MSO logic to DFA transformation [IJFCS05] State of the art for MSO

Conclusion Results Adapted classical minimization algorithm to the symbolic setting New minimization algorithm for symbolic automata (faster than previous ones) Future work Extend to tree automata Extend classical automata problems to SFAs – Edit distance? – Regex for symbolic automata? 44

Future Work Extending classical automata problems to SFAs – Edit distance? – Regex for symbolic automata? – Random generation of SFAs Using transducers (BEX) for inverting simple programs automatically. We already have some results on how to check injectivity QUESTIONS? 45

Application 1: Solving Monadic Second Order logic (MSO) MSO logic is equivalent to regular languages For example ∃ x,y. x<y ∧ a(x) ∧ b(y) Describes the DFA 0 1 a 2 b 2 b a a,b FO Variables = positions SO Variables = sets of positions

Monadic second order logic to DFA transformation Φ := Φ ∧ Φ | ¬ Φ | ∃ X.Φ | S(X) | X ⊆ Y | a(X) |X<Y For every subformula Φ we inductively compute the corresponding DFA, A(Φ). The first two are easy using automata operations (intersection complement) What about ∃ X.Φ ?

Monadic second order logic to DFA transformation For every formula Φ with free variables X 1,…,X n we extend the alphabet to model Φ Now we have a formula for Φ To compute ∃ X.Φ we remove the first element of the bitvector from every transition b0100b0100 Current position of the element being read: -is labeled with b, and -belongs to set X 2, but not X 1, X 3, X 4 X1X2X3X4X1X2X3X4

Monadic second order logic to DFA transformation The transformation is non-elementary n variables  n bits Secrets to make it work in practice Symbolic representation of alphabet, Minimize at every step We need a good representation of big alphabets, good minimization algorithms.