Outline (of this first talk)

Slides:



Advertisements
Similar presentations
Formal Languages: main findings so far
Advertisements

Grammar types There are 4 types of grammars according to the types of rules: – General grammars – Context Sensitive grammars – Context Free grammars –
Formal Languages: main findings so far A problem can be formalised as a formal language A formal language can be defined in various ways, e.g.: the language.
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
The Big Picture Chapter 3. We want to examine a given computational problem and see how difficult it is. Then we need to compare problems Problems appear.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 1 Introduction Jan Maluszynski, IDA, 2007
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Fall 2003Costas Busch - RPI1 Turing Machines (TMs) Linear Bounded Automata (LBAs)
Grammars, Languages and Finite-state automata Languages are described by grammars We need an algorithm that takes as input grammar sentence And gives a.
Final Exam Review Cummulative Chapters 0, 1, 2, 3, 4, 5 and 7.
1 Introduction to Automata Theory Reading: Chapter 1.
1 Mathematical Institute Serbian Academy of Sciences and Arts, Belgrade DEUKS Meeting Valencia, September 9-11, 2008, Valencia New PhD modules proposal.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
1 Theory of Computation 計算理論 2 Instructor: 顏嗣鈞 Web: Time: 9:10-12:10 PM, Monday Place: BL 103.
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
By: Er. Sukhwinder kaur.  What is Automata Theory? What is Automata Theory?  Alphabet and Strings Alphabet and Strings  Empty String Empty String 
1 Grammatical inference Vs Grammar induction London June 2007 Colin de la Higuera.
Lecture 1 Computation and Languages CS311 Fall 2012.
Learning Automata and Grammars Peter Černo.  The problem of learning or inferring automata and grammars has been studied for decades and has connections.
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Saeid Pashzadeh Jan 2009 Theory of Computation 1.
1 Theory of Computation 計算理論 2 Instructor: 顏嗣鈞 Web: Time: 9:10-12:10 PM, Monday Place: BL.
Probabilistic Automaton Ashish Srivastava Harshil Pathak.
Finite Automata Chapter 1. Automatic Door Example Top View.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
CSCI 2670 Introduction to Theory of Computing October 13, 2005.
Lecture 16b Turing Machines Topics: Closure Properties of Context Free Languages Cocke-Younger-Kasimi Parsing Algorithm June 23, 2015 CSCE 355 Foundations.
C Sc 132 Computing Theory Professor Meiliu Lu Computer Science Department.
Theory of Computation. Introduction to The Course Lectures: Room ( Sun. & Tue.: 8 am – 9:30 am) Instructor: Dr. Ayman Srour (Ph.D. in Computer Science).
Introduction to Automata Theory
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Theory of Computation Automata Theory Dr. Ayman Srour.
Theory of Languages and Automata By: Mojtaba Khezrian.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Chapter 4 Introduction to Set Theory
Computability Joke. Context-free grammars Parsing. Chomsky
Why Study Automata Theory and Formal Languages?
CIS Automata and Formal Languages – Pei Wang
Why Study Automata? What the Course is About Administrivia
BCT 2083 DISCRETE STRUCTURE AND APPLICATIONS
Introduction to Formal Languages
CS 326 Programming Languages, Concepts and Implementation
CS 404 Introduction to Compiler Design
Lecture 1 Theory of Automata
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
Review : Theory of Computation
Linear Bounded Automata LBAs
Theory of Computation Theory of computation is mainly concerned with the study of how problems can be solved using algorithms.  Therefore, we can infer.
Automata, Formal Grammars, and Consequences
Context Sensitive Grammar & Turing Machines
CSCE 355 Foundations of Computation
CS314 – Section 5 Recitation 3
Formal Language Theory
Introduction to Automata Theory
CSE322 The Chomsky Hierarchy
A HIERARCHY OF FORMAL LANGUAGES AND AUTOMATA
CHAPTER 2 Context-Free Languages
Context-Free Grammars
Context-Free Languages
Finite Automata Reading: Chapter 2.
Regular Expressions
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars
Lecture 5 Scanning.
Sub: Theoretical Foundations of Computer Sciences
The Chomsky Hierarchy Costas Busch - LSU.
CIS Automata and Formal Languages – Pei Wang
Grammatical inference: learning models
Presentation transcript:

Grammatical inference: an introduction Module X9IT050, Colin de la Higuera, Nantes, 2014

Outline (of this first talk) 1 What is grammatical inference about? 2 Why is it a difficult task? 3 Why is it a useful task? 4 Validation issues 5 Some criteria Colin de la Higuera, Nantes, 2014

What is grammatical inference about? Colin de la Higuera, Nantes, 2014

1. Grammatical inference is about learning a grammar given information about a language Information is strings, trees or graphs Information can be (typically) Text: only positive information Informant: labelled data Actively sought (query learning, teaching) [Above lists are not limitative] Colin de la Higuera, Nantes, 2014

1.1 The functions/goals Languages and grammars from the Chomsky hierarchy Probabilistic automata and context-free grammars Hidden Markov Models Patterns Transducers Colin de la Higuera, Nantes, 2014

1.2 The Chomsky hierarchy Recursively enumerable languages Context sensitive languages Regular languages Context-free languages Colin de la Higuera, Nantes, 2014

1.3 The Chomsky hierarchy revisited Regular languages Recognized by DFA, NFA Generated by regular grammars Described by regular expressions Context-free languages Generated by CF grammars Recognized by stack automata Context-sensitive languages CS grammars (parsing is not in P) RE languages (all Turing machines) Parsing is undecidable Colin de la Higuera, Nantes, 2014

Topological formalisms 1.4 Other formalisms Topological formalisms Semilinear languages Hyperplanes Balls of strings Colin de la Higuera, Nantes, 2014

1.5 Distributions of strings A probabilistic automaton defines a distribution over the strings Colin de la Higuera, Nantes, 2014

1.6 Fuzzy automata An automaton will say that string w belongs to the language with probability p The difference with the probabilistic automata is that The total sum of probabilities may be different than 1 (may even be infinite) The fuzzy automaton cannot be used as a generator of strings Colin de la Higuera, Nantes, 2014

1.7 The data: examples of strings A string in Gaelic and its translation to English: Tha thu cho duaichnidh ri èarr àirde de a’ coisich deas damh You are as ugly as the north end of a southward traveling ox Colin de la Higuera, Nantes, 2014

1.7 The data: examples of strings Time series pose the problem of the alphabet: An infinite alphabet? Discretizing? An ordered alphabet Sinus rhythm with acquired long QT, work found via Flickr, by Popfossa, CC BY 2.0 Colin de la Higuera, Nantes, 2014

1.7 The data: examples of strings Codis profile, Chemical Science & Technology Laboratory, National Institute of Standards and Technology, work found via Wikipedia, CC BY-SA 3.0 Colin de la Higuera, Nantes, 2014

1.7 The data: examples of strings >A BAC=41M14 LIBRARY=CITB_978_SKB AAGCTTATTCAATAGTTTATTAAACAGCTTCTTAAATAGGATATAAGGCAGTGCCATGTA GTGGATAAAAGTAATAATCATTATAATATTAAGAACTAATACATACTGAACACTTTCAAT GGCACTTTACATGCACGGTCCCTTTAATCCTGAAAAAATGCTATTGCCATCTTTATTTCA GAGACCAGGGTGCTAAGGCTTGAGAGTGAAGCCACTTTCCCCAAGCTCACACAGCAAAGA CACGGGGACACCAGGACTCCATCTACTGCAGGTTGTCTGACTGGGAACCCCCATGCACCT GGCAGGTGACAGAAATAGGAGGCATGTGCTGGGTTTGGAAGAGACACCTGGTGGGAGAGG GCCCTGTGGAGCCAGATGGGGCTGAAAACAAATGTTGAATGCAAGAAAAGTCGAGTTCCA GGGGCATTACATGCAGCAGGATATGCTTTTTAGAAAAAGTCCAAAAACACTAAACTTCAA CAATATGTTCTTTTGGCTTGCATTTGTGTATAACCGTAATTAAAAAGCAAGGGGACAACA CACAGTAGATTCAGGATAGGGGTCCCCTCTAGAAAGAAGGAGAAGGGGCAGGAGACAGGA TGGGGAGGAGCACATAAGTAGATGTAAATTGCTGCTAATTTTTCTAGTCCTTGGTTTGAA TGATAGGTTCATCAAGGGTCCATTACAAAAACATGTGTTAAGTTTTTTAAAAATATAATA AAGGAGCCAGGTGTAGTTTGTCTTGAACCACAGTTATGAAAAAAATTCCAACTTTGTGCA TCCAAGGACCAGATTTTTTTTAAAATAAAGGATAAAAGGAATAAGAAATGAACAGCCAAG TATTCACTATCAAATTTGAGGAATAATAGCCTGGCCAACATGGTGAAACTCCATCTCTAC TAAAAATACAAAAATTAGCCAGGTGTGGTGGCTCATGCCTGTAGTCCCAGCTACTTGCGA GGCTGAGGCAGGCTGAGAATCTCTTGAACCCAGGAAGTAGAGGTTGCAGTAGGCCAAGAT GGCGCCACTGCACTCCAGCCTGGGTGACAGAGCAAGACCCTATGTCCAAAAAAAAAAAAA AAAAAAAGGAAAAGAAAAAGAAAGAAAACAGTGTATATATAGTATATAGCTGAAGCTCCC TGTGTACCCATCCCCAATTCCATTTCCCTTTTTTGTCCCAGAGAACACCCCATTCCTGAC TAGTGTTTTATGTTCCTTTGCTTCTCTTTTTAAAAACTTCAATGCACACATATGCATCCA TGAACAACAGATAGTGGTTTTTGCATGACCTGAAACATTAATGAAATTGTATGATTCTAT Colin de la Higuera, Nantes, 2014

1.7 The data: examples of strings Cancionero de Palacio, work found via Wikipedia, CC BY-SA 3.0 Colin de la Higuera, Nantes, 2014

1.7 The data: examples of strings Généalogie2 Guerre de Cent Ans, work found via Wikipedia, CC BY-SA 3.0 Colin de la Higuera, Nantes, 2014

1.7 The data: examples of strings Phylogenetic Tree, Woese 1990, Maulucioni, work found via Wikipedia, CC BY-SA 3.0 Colin de la Higuera, Nantes, 2014

1.7 The data: examples of strings <book> <part> <chapter> <sect1/> <sect1> <orderedlist numeration="arabic"> <listitem/> <f:fragbody/> </orderedlist> </sect1> </chapter> </part> </book> Colin de la Higuera, Nantes, 2014

1.7 The data: examples of strings <?xml version="1.0"?> <?xml-stylesheet href="carmen.xsl" type="text/xsl"?> <?cocoon-process type="xslt"?> <!DOCTYPE pagina [ <!ELEMENT pagina (titulus?, poema)> <!ELEMENT titulus (#PCDATA)> <!ELEMENT auctor (praenomen, cognomen, nomen)> <!ELEMENT praenomen (#PCDATA)> <!ELEMENT nomen (#PCDATA)> <!ELEMENT cognomen (#PCDATA)> <!ELEMENT poema (versus+)> <!ELEMENT versus (#PCDATA)> ]> <pagina> <titulus>Catullus II</titulus> <auctor> <praenomen>Gaius</praenomen> <nomen>Valerius</nomen> <cognomen>Catullus</cognomen> </auctor> Colin de la Higuera, Nantes, 2014

1.7 The data: examples of strings Colin de la Higuera, Nantes, 2014

1.7 And also Business processes Bird songs Images (contours and shapes) Robot moves Web services Malware … Colin de la Higuera, Nantes, 2014

What does learning mean? Colin de la Higuera, Nantes, 2014

2. What does learning mean? Suppose we write a program that can learn grammars… are we done? A first question is: “why bother?” If my programme works, why do something more about it? Why should we do something when other researchers in Machine Learning are not? Colin de la Higuera, Nantes, 2014

3. Motivating reflection #1 Is 17 a random number? Is 0110110110110101011000111101 a random sequence? Is grammar G the correct grammar for a given sample S? Colin de la Higuera, Nantes, 2014

3. Motivating reflection #2 In the case of languages, learning is an ongoing process Is there a moment where we can say we have learnt a language? Colin de la Higuera, Nantes, 2014

3. Motivating reflection #3 Statement “I have learnt” does not make sense Statement “I am learning” makes sense At least when learning over infinite spaces Colin de la Higuera, Nantes, 2014

4. What usually is called “having learnt” That the grammar / automaton is the smallest, best (re a score)  Combinatorial characterisation That some optimisation problem has been solved That the “learning” algorithm has converged (EM) Colin de la Higuera, Nantes, 2014

4. What is not said That having solved some complex combinatorial question we have an Occam, Compression, MDL, Kolmogorov complexity like argument which gives us some guarantee with respect to the future Computational learning theory has got such results Colin de la Higuera, Nantes, 2014

[they actually do bother, only differently] 4. Why should we bother and those working in statistical machine learning not? Whether with numerical functions or with symbolic functions, we are all trying to do some sort of optimisation The difference is (perhaps) that numerical optimisation works much better than combinatorial optimisation! [they actually do bother, only differently] Colin de la Higuera, Nantes, 2014

Colin de la Higuera, Nantes, 2014