Characterization of state merging strategies which ensure identification in the limit from complete data (II) Cristina Bibire.

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

Automata Theory Part 1: Introduction & NFA November 2002.
Review: Search problem formulation
Efficient Language Learning from Restricted Information Cristina Bibire 19 th of May 2006 DEA defence Professor Colin de la Higuera Professor Victor Mitrana.
Characterization of state merging strategies which ensure identification in the limit from complete data Cristina Bibire.
Unambiguous automata inference by means of states-merging methods François Coste, Daniel Fredouille
Reducing DFA’s Section 2.4. Reduction of DFA For any language, there are many DFA’s that accept the language Why would we want to find the smallest? Algorithm:
Random Forest Predrag Radenković 3237/10
Longest Common Subsequence
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
Regular Expressions and DFAs COP 3402 (Summer 2014)
Sandro Spina, John Abela Department of CS & AI, University of Malta. Mutually compatible and incompatible merges for the search of the smallest consistent.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Fast Algorithms For Hierarchical Range Histogram Constructions
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Determinization of Büchi Automata
The Power of Correction Queries Cristina Bibire Research Group on Mathematical Linguistics, Rovira i Virgili University Pl. Imperial Tarraco 1, 43005,
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
SASH Spatial Approximation Sample Hierarchy
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Introduction to Computability Theory
Introduction to Computability Theory
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
Simulation.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
Fall 2006Costas Busch - RPI1 Non-Deterministic Finite Automata.
1 CSCI 3130: Formal Languages and Automata Theory Tutorial 4 Hung Chun Ho Office: SHB 1026 Department of Computer Science & Engineering.
1 Non-Deterministic Automata Regular Expressions.
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong DFA minimization.
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
Genetic Algorithm.
Zadar, August Learning from an informant Colin de la Higuera University of Nantes.
Learning DFA from corrections Leonor Becerra-Bonache, Cristina Bibire, Adrian Horia Dediu Research Group on Mathematical Linguistics, Rovira i Virgili.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
Syntax Analysis The recognition problem: given a grammar G and a string w, is w  L(G)? The parsing problem: if G is a grammar and w  L(G), how can w.
Learning Automata and Grammars Peter Černo.  The problem of learning or inferring automata and grammars has been studied for decades and has connections.
On Learning Regular Expressions and Patterns via Membership and Correction Queries Efim Kinber Sacred Heart University Fairfield, CT, USA.
CS 415 – A.I. Slide Set 5. Chapter 3 Structures and Strategies for State Space Search – Predicate Calculus: provides a means of describing objects and.
Prof. Busch - LSU1 NFAs accept the Regular Languages.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Amortized Algorithm Analysis COP3503 July 25, 2007 Andy Schwartz.
Inferring Finite Automata from queries and counter-examples Eggert Jón Magnússon.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Automata & Formal Languages, Feodor F. Dragan, Kent State University 1 CHAPTER 3 The Church-Turing Thesis Contents Turing Machines definitions, examples,
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
CS 203: Introduction to Formal Languages and Automata
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Dynamic programming with more complex models When gaps do occur, they are often longer than one residue.(biology) We can still use all the dynamic programming.
Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.
Computational Learning Theory Part 1: Preliminaries 1.
Learning regular tree languages from correction and equivalence queries Cătălin Ionuţ Tîrnăucă Research Group on Mathematical Linguistics, Rovira i Virgili.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
1 Distributed Vertex Coloring. 2 Vertex Coloring: each vertex is assigned a color.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Theory of Computation Automata Theory Dr. Ayman Srour.
Presentation transcript:

Characterization of state merging strategies which ensure identification in the limit from complete data (II) Cristina Bibire

 State of the art  A new approach  Motivation  Results  Further Research  Bibliography

State of the Art Gold was the first one to formulate the process of learning formal languages Trakhtenbrot and Barzdin described a polynomial time algorithm (TB algorithm) for constructing the smallest DFA consistent with a completely labelled training set (a set that contains all the words up to a certain length) Gold rediscovers the TB algorithm and applies it to the discipline of grammatical inference (uniformly complete samples are not required). He also specifies the way to obtain indistinguishable states using the so called state characterization matrices Oncina and Garcia propose the RPNI (Regular Positive and Negative Inference) algorithm Lang describes TB algorithm and generalize it to produce a (not necessarily minimum) DFA consistent with a sparsely labelled tree. The algorithm (Traxbar) can deal with incomplete data sets as well as complete data sets.

State of the Art Lang and Pearlmutter organized the Abbadingo One contest. The competition presented the challenge of predicting, with 99% accuracy, the labels which an unseen FSA would assign to test data given training data consisting of positive and negative examples. Price was able to win the Abbadingo One Learning Competition by using an evidence-driven state merging (EDSM) algorithm. Essentially, Price realized that an effective way of choosing which pair of nodes to merge next within the APTA would simply involve selecting the pair of nodes whose sub-tree share the most similar labels. As a post-competition work, Lang proposed W-EDSM. In order to improve the running time of the EDSM algorithm, we only consider merging nodes that lie within a fixed sized window from the root node of the APTA An alternative windowing method (Blue-fringe algorithm) is also described by Lang, Pearlmutter and Price. It uses a red and blue colouring scheme to provide a simple but effective way of choosing the pool of merge candidates at each merge level in the search.

State of the Art  TB algorithm  Gold’s algorithm  RPNI  Traxbar  EDSM  W-EDSM  Blue-fringe

State of the Art L=Any string without an odd number of consecutive 0’s after an odd number of consecutive 1’s ,

TB algorithm λ (0,λ) 1101

TB algorithm λ (11,λ) 1101

TB algorithm λ (1000,10) λ (1001,1)

TB algorithm (1001,1) λ (1010,101) (1011,101) λ ,1 0

Gold’s algorithm S SΣ-S list {λ}{λ}{λ}{λ}{0,1} (0,λ) dist? No λ

Gold’s algorithm S SΣ-S list {λ}{λ}{λ}{λ}{0,1} {(0,λ)} (0,λ) dist? No λ (1,λ) dist? Yes add 1 to S 1

Gold’s algorithm S SΣ-S list {λ} {λ,1} {0,1}{0,10,11} {(0,λ)} (0,λ) dist? No (1,λ) dist? Yes (10,λ) dist? (10,1) dist? Yes add 10 to S add 1 to S λ

Gold’s algorithm S SΣ-S list {λ} {λ,1} {λ,1,10} {0,1}{0,10,11}{0,11,100,101} {(0,λ)} (0,λ) dist? No (1,λ) dist? Yes (10,λ) dist? (10,1) dist? Yes add 10 to S add 1 to S (11,λ) dist?No λ

Gold’s algorithm S SΣ-S list {λ} {λ,1} {λ,1,10} {0,1}{0,10,11}{0,11,100,101} {(0,λ)} {(0,λ),(11,λ)} (0,λ) dist? No (1,λ) dist? Yes (10,λ) dist? (10,1) dist? Yes add 10 to S add 1 to S (11,λ) dist?No (100,λ) dist? (100,1) dist? (100,10) dist? Yes add 100 to S λ

Gold’s algorithm λ S SΣ-S list {λ} {λ,1} {λ,1,10} {λ,1,10,100} {0,1}{0,10,11}{0,11,100,101}{0,11,101,1000,1001} {(0,λ)} {(0,λ),(11,λ)} (101,λ) dist? Yes add 101 to S (101,1) dist? (101,10) dist? (101,100) dist? 1

Gold’s algorithm λ S SΣ-S list {λ} {λ,1} {λ,1,10} {λ,1,10,100} {λ,1,10,100,101} {0,1}{0,10,11}{0,11,100,101}{0,11,101,1000,1001} {0,11,1000,1001,1010, 1011} {(0,λ)} {(0,λ),(11,λ)} (1000,10) dist? No 1

Gold’s algorithm λ S SΣ-S list {λ} {λ,1} {λ,1,10} {λ,1,10,100} {λ,1,10,100,101} {0,1}{0,10,11}{0,11,100,101}{0,11,101,1000,1001} {0,11,1000,1001,1010, 1011} {(0,λ)} {(0,λ),(11,λ)} {(0,λ),(11,λ),(10 00,10)} (1000,10) dist? (1001,1) dist? No 1

Gold’s algorithm λ S SΣ-S list {λ} {λ,1} {λ,1,10} {λ,1,10,100} {λ,1,10,100,101} {0,1}{0,10,11}{0,11,100,101}{0,11,101,1000,1001} {0,11,1000,1001,1010, 1011} {(0,λ)} {(0,λ),(11,λ)} {(0,λ),(11,λ),(10 00,10),(1001,1)} (1000,10) dist? (1001,1) dist? No (1010,101) dist? No 1

Gold’s algorithm λ S SΣ-S list {λ} {λ,1} {λ,1,10} {λ,1,10,100} {λ,1,10,100,101} {0,1}{0,10,11}{0,11,100,101}{0,11,101,1000,1001} {0,11,1000,1001,1010, 1011} {(0,λ)} {(0,λ),(11,λ)} {(0,λ),(11,λ),(10 00,10),(1001,1),( 1010,101)} (1000,10) dist? (1001,1) dist? No (1010,101) dist? No (1011,101) dist?No 1

Gold’s algorithm λ S SΣ-S list {λ} {λ,1} {λ,1,10} {λ,1,10,100} {λ,1,10,100,101} {0,1}{0,10,11}{0,11,100,101}{0,11,101,1000,1001} {0,11,1000,1001,1010, 1011} {(0,λ)} {(0,λ),(11,λ)} {(0,λ),(11,λ),(10 00,10),(1001,1),( 1010,101),(1011, 101)} (1000,10) dist? (1001,1) dist? No (1010,101) dist? No (1011,101) dist?No 1

Gold’s algorithm S SΣ-S list {λ} {λ,1} {λ,1,10} {λ,1,10,100} {λ,1,10,100,101} {0,1}{0,10,11}{0,11,100,101}{0,11,101,1000,1001}{0,11,1000,1001,1010,1011} {(0,λ)} {(0,λ),(11,λ)} {(0,λ),(11,λ),(1000,10),(1001,1), (1010,101),(1011,101)} λ ,

RPNI λ λ K Fr

Traxbar A variation of the Trakhtenbrot and Barzdin algorithm was implemented by Lang. The modifications made to the algorithm were needed to maintain consistency with incomplete training sets. For instance, unlabeled nodes and missing transitions in the APTA needed to be considered. The simple extensions added to the Trakhtenbrot and Barzdin algorithm are as follows. If node q 2 is to be merged with node q 1 then: labels of labelled nodes in the sub-tree rooted at q 2 must be copied over their respective unlabeled nodes in the sub-tree rooted at q 1 ; transitions in any of the nodes in the sub-tree rooted at q 2 that do not exist in their respective node in the sub-tree rooted at q 1 must be copied in. As a result of these changes, the Traxbar algorithm will produce a (not necessarily minimum size) DFA that is consistent with the training set.

Traxbar λ (0,λ) (11,λ) λ, λ,0,11,

Traxbar λ,0,11, (1000,10)(1001,1) , λ,0,11, ,10 00, λ,0,11, , (1010,101) (1011,101)

Traxbar 101,1010, 10100,10 11, ,100 0, λ,0,11, , ,1

EDSM The general idea of the EDSM approach is to avoid bad merges by selecting the pair of nodes within the APTA which has the highest score. It is expected that the scoring will indicate the correctness of each merge, since on average, a merge that survives more label comparisons is more likely to be correct. A post-competition version of the EDSM algorithm as described by Lang, Pearlmutter and Price is included below. Evaluate all possible pairings of nodes within the APTA. Merge the pair of nodes which has the highest calculated score. Repeat the steps above until no other nodes within the APTA can be merged. The score is calculated by assigning one point for each overlapping label node within the sub-tree rooted at the nodes considered for merging

Windowed-EDSM To improve the running time of the EDSM algorithm, it is suggested that we only consider merging nodes that lie within a fixed sized window from the root node of the APTA. In breadth-first order, create a window of nodes starting from the root of the APTA. Evaluate all possible merge pairs within the window. Merge the pair of nodes which has the highest number of matching labels within its sub-trees. If the merge reduces the size of the window, in breadth-first order, include the number of nodes needed to regain the fixed size of the window. If no merge is possible within the given window, increase the size of the window by a factor of 2. Terminate when no merges are possible. The recommended size of the window is twice the number of states in the target DFA.

Blue-Fringe An alternative windowing method to that used by the W-EDSM algorithm is also described by Lang, Pearlmutter and Price. It uses a red and blue colouring scheme to provide a simple but effective way of choosing the pool of merge candidates at each merge level in the search. The Blue-fringe windowing method aids in the implementation of the algorithm and improves on its running time. Colour the root of the APTA red. Colour the non-red children of each red node blue. Evaluate all possible pairings of red and blue nodes. Promote the first blue node which is distinguishable from each red node. Otherwise, merge the pair of nodes which have the highest number of matching labels within their sub-trees. Terminate when there are no blue nodes to promote and no possible merges to perform.

A new approach Goal: to design an algorithm capable of incrementally learn new information without forgetting previously acquired knowledge and without requiring access to the original set of samples. We denote by: = the set of all automata having the alphabet Let be any of the algorithms defined so far (TB, Gold, RPNI, Traxbar, EDSM, W-EDSM, Blue-fringe)

A new approach

(q,1) q

A new approach

Motivation Q: Why do we need this algorithm? 1.In many practical applications, acquisition of a representative training data is expensive and time consuming. 2.It might be the case that a new sample is introduced after several days, months or even years 3.We might have lost the initial database

Results Lemma 1 It is not always true that: Lemma 2 Let such that. It is not always true that:Lemma 3 Let such that. It is not always true that:

Results Sketch of the Proof (for Lemma1 and Lemma2): Sketch of the Proof (for Lemma3):

Further Research o To determine the complexity of the algorithm and to test it on large/sparse data o To determine how much time and resources we save using this algorithm instead of the classical ones o To design an algorithm to deal with new introduced negative samples o To find the answer to the question: when is the automaton created with this method weakly equivalent with the one obtained with the entire sample? o To improve the software in order to be able to deal with new samples

Bibliography Colin de la Higuera, José Oncina, Enrique Vidal. “Identification of DFA: Data-Dependent versus Data-Independent Algorithms” Rajesh Parekh, Vasant Honavar. “Learning DFA from Simple Examples” Michael J. Kearns, Umesh V. Vazirani “An Introduction to Computational Theory” J. Oncina, P. Garcia. “A polynomial algorithm to infer regular languages” Dana Angluin. “Inference of Reversible Languages” P. Garcia, A. Cano, J. Ruiz. “A comparative study of two algorithms for automata identification” P. Garcia, A. Cano, J. Ruiz. “Inferring subclasses of regular languages faster using RPNI and forbidden configurations” K.J. Lang, B.A. Pearlmutter, R.A. Price. “Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm”

Bibliography Takashi Yokomori. “Grammatical Inference and Learning”. M. Sebban, J.C. Janodet, E. Tantini. ”BLUE : a Blue-Fringe Procedure for Learning DFA with Noisy Data” Kevin J. Lang. “Random DFA’s can be Approximately Learned from Sparse Uniform Examples” P. Dupont, L. Miclet, E. Vidal. “What is the search space of the regular inference?” Sara Porat, Jerome A. Feldman. “Learning Automata from Ordered Examples” Orlando Cicchello, Stefan C. Kremer. “Inducing Grammars from Sparse Data Sets: A Survey of Algorithms and results”