Minimization of Symbolic Transducers

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

CS 345: Chapter 9 Algorithmic Universality and Its Robustness
Variants of Turing machines
Pushdown Automata Section 2.2 CSC 4170 Theory of Computation.
Complexity and Computability Theory I Lecture #4 Rina Zviel-Girshin Leah Epstein Winter
1 Nondeterministic Space is Closed Under Complement Presented by Jing Zhang and Yingbo Wang Theory of Computation II Professor: Geoffrey Smith.
Equivalence of Extended Symbolic Finite Transducers Presented By: Loris D’Antoni Joint work with: Margus Veanes.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
DFA Minimization Jeremy Mange CS 6800 Summer 2009.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture2: Non Deterministic Finite Automata Prof. Amos Israeli.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture7: PushDown Automata (Part 1) Prof. Amos Israeli.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture2: Non Deterministic Finite Automata (cont.) Prof. Amos Israeli.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
1 Normal Forms for Context-free Grammars. 2 Chomsky Normal Form All productions have form: variable and terminal.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Finite Automata Costas Busch - RPI.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong DFA minimization.
Static Analysis of String Encoders and Decoders Presented By: Loris D’Antoni Joint work with: Margus Veanes.
THEORY OF COMPUTATION 08 KLEENE’S THEOREM.
Minimization of Symbolic Automata Presented By: Loris D’Antoni Joint work with: Margus Veanes 01/24/14, POPL14.
Pushdown Automata (PDAs)
Computability Construct TMs. Decidability. Preview: next class: diagonalization and Halting theorem.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Complexity and Computability Theory I Lecture #2 Rina Zviel-Girshin Leah Epstein Winter
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
INHERENT LIMITATIONS OF COMPUTER PROGAMS CSci 4011.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Computing Machinery Chapter 4: Finite State Machines.
Lecture Notes 
Grammar Set of variables Set of terminal symbols Start variable Set of Production rules.
Theory of Computation Automata Theory Dr. Ayman Srour.
1 Chapter Pushdown Automata. 2 Section 12.2 Pushdown Automata A pushdown automaton (PDA) is a finite automaton with a stack that has stack operations.
1/29/02CSE460 - MSU1 Nondeterminism-NFA Section 4.1 of Martin Textbook CSE460 – Computability & Formal Language Theory Comp. Science & Engineering Michigan.
Theory of Computation Automata Theory Dr. Ayman Srour.
Costas Busch - LSU1 Deterministic Finite Automata And Regular Languages.
Fall 2004COMP 3351 Finite Automata. Fall 2004COMP 3352 Finite Automaton Input String Output String Finite Automaton.
6. Pushdown Automata CIS Automata and Formal Languages – Pei Wang.
Formal Language & Automata Theory
New Characterizations in Turnstile Streams with Applications
Non Deterministic Automata
FORMAL LANGUAGES AND AUTOMATA THEORY
Regular Expressions: Review
Pushdown Automata PDAs
Pushdown Automata PDAs
Pushdown Automata PDAs
Pushdown Automata PDAs
Chapter 2 FINITE AUTOMATA.
Deterministic Finite Automata And Regular Languages Prof. Busch - LSU.
Non-Deterministic Finite Automata
Non-Deterministic Finite Automata
Non Deterministic Automata
Chapter Nine: Advanced Topics in Regular Languages
Pushdown automata a_introduction.htm.
Chapter 2 Context-Free Language - 01
… NPDAs continued.
Sub: Theoretical Foundations of Computer Sciences
Mealy and Moore Machines
Theoretical Foundations of Computer Sciences
CSCI 2670 Introduction to Theory of Computing
Non Deterministic Automata
Normal Forms for Context-free Grammars
What is it? The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-acting". An automaton (Automata in plural) is an abstract self-propelled.
Presentation transcript:

Minimization of Symbolic Transducers Olli Saarikivi Margus Veanes

Motivation Disk / Network 9 95 7d 2e 98 80 e4 3e 76 0b 3b Many useful stream processing computations can be represented as transducers A pipeline of transducers can be fused into a single transducer Reduces communication overhead Exposes opportunities for optimization Fusion has a worst case quadratic blowup → target for reduction Deserialize 12/12/12 SPY 50.13} {12/13/12 S SelectPrice 49.44, 50.13, 48.13, 51.32, 53.53 FindPriceDips 0, 0, 0, 0, 5, 0, 0, 3, 0, 0, 0, 0, 0, 0 Serialize b1 a9 86 a8 70 7d a3 66 01 05 3a Disk / Network CAV 2017

A Fusion Engine C# Regex XPath Frontend Symbolic Transducers (STs) as the intermediate representation Fuse adjacent STs in a pipeline until a single ST remains Apply reductions during fusion Control State Reduction Reachability Based Branch Elimination (PLDI 2017) STs Fusion CSR ST RBBE CodeGen Fused C# CAV 2017

Symbolic Finite Automata Classical Automaton Finite (small) alphabet Concrete transitions 𝑝 𝑎 𝑞 Symbolic Finite Automaton (SFA) Input type from a decidable theory Symbolic transitions 𝑝 𝜑 𝑞, where 𝜑 is a predicate over the input EveryOtherEven 𝑥 mod 2 =0 Guard A B ⊤ Accepting state Rejecting state CAV 2017

Symbolic (Finite) Transducers Symbolic Transducer (ST) Transitions 𝑝 𝜑⁄ 𝑓 𝑖 𝑖=1 𝑛 ; 𝑔 𝑞 can use a register for additional state Classical Transducer Finite (small) alphabets Finite set of control states Concrete transitions 𝑝 𝑎/ 𝑏 𝑖 𝑖=1 𝑛 𝑞 Symbolic Finite Transducer (SFT) Input and output types from a decidable theory Symbolic transitions 𝑝 𝜑⁄ 𝑓 𝑖 𝑖=1 𝑛 𝑞 ParseInts ¬ IsDigit (𝑥) ∕[];0 IsDigit (𝑥) ∕[]; 10∗𝑟 +𝑥−′0′ IsDigit (𝑥) ∕[];𝑥−"0" ⊤∕[𝑟] 1 2 ⊤∕[] ¬ IsDigit (𝑥) ∕[𝑟];0 Finalizer Initial register value Guard Yields Register update CAV 2017

Running Example:  Pipeline of two SFTs Equivalent to just Unsmileyfy Smileyfy changes :) to  Unsmileyfy changes  to :) Equivalent to just Unsmileyfy The fused pipeline should reduce to Unsmileyfy the beach :) See you later!  Remembe Smileyfy Smileyfy Unsmileyfy the beach  See you later!  Remembe Unsmileyfy the beach :) See you later! :) Remembe CAV 2017

𝑥≠′:′∧𝑥≠′)′∧𝑥≠′☺′⁄[′:′,𝑥] Smileyfy Unsmileyfy ⊤⁄[] 𝑥=′:′⁄[] ⊤⁄[′:′] ⊤⁄[] A 1 2 𝑥=′☺′⁄[′:′,′)′] 𝑥≠′:′⁄[𝑥] 𝑥=′)′⁄[′☺′] 𝑥=′:′⁄[𝑥] 𝑥≠′☺′⁄[𝑥] 𝑥≠′:′∧𝑥≠′)′⁄[′:′,𝑥] Smileyfy Unsmileyfy ⊤⁄[] 𝑥=′:′⁄[] ⊤⁄[′:′] 𝑥≠′:′∧𝑥≠′☺′⁄[𝑥] 1A 2A 𝑥=′☺′⁄[′:′,′)′] 𝑥=′)′⁄[′:′,′)′] 𝑥=′:′⁄[𝑥] 𝑥≠′:′∧𝑥≠′)′∧𝑥≠′☺′⁄[′:′,𝑥] 𝑥=′☺′⁄[′:′,′:′,′)′] CAV 2017

Control State Reduction 𝐴 𝐴 / ≡ SFA(𝐴) Quotient ³ Encoding ¹ ¹ Encode into an SFA that accepts valid transductions ² Minimize to produce an equivalence relation ³ Use the equivalence relation to merge states in original ST SFA(𝐴) Minimize ² ≡ SFA(𝐴) CAV 2017

The Encoding Idea: inputs represent transitions as tuples of input × current register × outputs × new register SFA(𝐴) accepts valid transductions 𝐴 SFA(𝐴) Input type 𝛪 𝐓 𝛪×𝑅× 𝛰 ×𝑅 ∪ 𝐅(𝑅× 𝛰 ) Output type 𝛰 Register type 𝑅 Control states 𝑄 States 𝑄∪{ 𝑞 𝑓 } CAV 2017

The Encoding in Practice Transition 𝑥≥1⁄[𝑟];𝑥+𝑟 Encoding Is𝐓 𝑥 ∧ 𝑥 𝑖 ≥1∧ 𝑥 𝑜 = 𝑥 𝑟 ∧ 𝑥 𝑟 ′ = 𝑥 𝑖 + 𝑥 𝑟 Guard Yields Update Unsmileyfy SFA(Unsmileyfy) 𝑞 𝑓 ⊤⁄[] Is𝐅 𝑥 ∧ 𝑥 𝑜 =[] A 𝑥=′☺′⁄[′:′,′)′] A Is𝐓 𝑥 ∧ 𝑥 𝑖 =′☺′∧ 𝑥 𝑜 =[′:′,′)′] 𝑥≠′☺′⁄[𝑥] Is𝐓 𝑥 ∧ 𝑥 𝑖 ≠′☺′∧ 𝑥 𝑜 =[ 𝑥 𝑖 ] CAV 2017

Control State Reduction 𝐴 𝐴 / ≡ SFA(𝐴) Quotient ² Encoding ¹ Now minimizing SFA(𝐴) gives an equivalence relation ≡ SFA(𝐴) over 𝑄 ² Merge ≡ SFA(𝐴) -equivalent states in 𝐴 ³ Can be can be any equivalence relation ~ such that ~⊆ ≡ SFA(𝐴) SFA(𝐴) Minimize ¹ ≡ SFA(𝐴) ³ CAV 2017

Late Yields Block Reduction Smileyfy Unsmileyfy ⊤⁄[] 𝑥=′:′⁄[] ⊤⁄[′:′] 𝑥≠′:′∧𝑥≠′☺′⁄[𝑥] 1A 2A 𝑥=′☺′⁄[′:′,′)′] 𝑥=′)′⁄[′:′,′)′] 𝑥=′:′⁄[𝑥] 𝑥≠′:′∧𝑥≠′)′∧𝑥≠′☺′⁄[′:′,𝑥] 𝑥=′☺′⁄[′:′,′:′,′)′] States are not equivalent All transitions will yield ‘:’ first CAV 2017

Quasi-Determinization Moves output to be as early as possible Used in the minimization of classical transducers Initial work by Christian Choffrut A more algorithmic approach by Mehryar Mohri Generalized to Tree Transducers as “Earliest Normal Form” The classical algorithm For all states find longest common prefixes of outputs in outgoing transitions Push the prefixes backwards to incoming transitions Repeat until nothing can be moved CAV 2017

Control State Reduction 𝐴 Quasi-Determinize ¹ QD(𝐴) QD(𝐴) / ≡ SFA(QD(𝐴)) Quotient ² Encoding ² ¹ ST is Quasi-Determinized as a preprocessing step ² Rest of the algorithm uses the quasi-determinized ST SFA(QD(𝐴)) Minimize ² ≡ SFA(QD(𝐴)) CAV 2017

Quasi-Determinization of SFTs For an SFT 𝐴 Do constant value analysis for all yields: ∀𝑥 𝑥 ′ :𝜑 𝑥 ∧𝜑 𝑥 ′ → 𝑓 𝑖 𝑥 = 𝑓 𝑖 ( 𝑥 ′ ) Substitute constant yields with the constants Run a variant of classical quasi-determinization, where non-constant yields are blocked from being moved SFT minimization theorem: if 𝐴 is a deterministic SFT then QD 𝐴 / ≡ SFA(QD 𝐴 ) is minimal Proof in paper CAV 2017

Quasi-Determinization in Practice Smileyfy Unsmileyfy ⊤⁄[] 𝑥=′:′⁄[′:′] 𝑥=′:′⁄[] ⊤⁄[] ⊤⁄[′:′] Now has a prefix [′:′] 𝑥≠′:′∧𝑥≠′☺′⁄[𝑥] 1A 𝑥=′:′⁄[′:′] 2A 𝑥=′☺′⁄[′:′,′)′] 𝑥=′)′⁄[′:′,′)′] 𝑥=′)′⁄[′)′] 𝑥=′:′⁄[′:′] 𝑥=′:′⁄[𝑥] 𝑥≠′:′∧𝑥≠′)′∧𝑥≠′☺′⁄[′:′,𝑥] 𝑥≠′:′∧𝑥≠′)′∧𝑥≠′☺′⁄[𝑥] 𝑥=′☺′⁄[′:′,′:′,′)′] 𝑥=′☺′⁄[′:′,′)′] Non-Constant Now the states are equivalent Constant CAV 2017

Efficacy of CSR for Fusions of STs Pipeline Removed |𝑸| Time Base64-delta 10 18 39.9 s CSV-max 4 26 18.0 s Base64-avg 114 166 99.6 s UTF8-lines 5 0.03 s CC-id 2024 983 4.4 s CHSI-cancer 12 558 2.2 s SBO-employees 36 0.2 s TPC-DI-SQL 68 457 44.1 s PIR-proteins 80 355 196.1 s DBLP-oldest 219 9.8 s MONDIAL-pop 56 319 12.4 s Huffman 915 360 2.6 s CSV parsing with regexes XML parsing with XPath English Huffman decode + line count CAV 2017

Conclusions Our Control State Reduction algorithm provides large reductions for fused pipelines of STs a minimization approach for deterministic SFTs Implementations in the Automata library https://github.com/OlliSaarikivi/Automata https://github.com/AutomataDotNet/Automata Also included in the paper Quasi-Determinization of STs Strengthening STs with invariants for more reduction Huffman coding using SFTs CAV 2017