Formal Learning Theory Michele Friend (Philosophy) and Valentina Harizanov (Mathematics)

Slides:



Advertisements
Similar presentations
Intro to NLP - J. Eisner1 Learning in the Limit Golds Theorem.
Advertisements

Formal Languages: main findings so far A problem can be formalised as a formal language A formal language can be defined in various ways, e.g.: the language.
Formal Languages Languages: English, Spanish,... PASCAL, C,... Problem: How do we define a language? i.e. what sentences belong to a language? e.g.Large.
Natural Language Processing - Formal Language - (formal) Language (formal) Grammar.
1 Lecture 32 Closure Properties for CFL’s –Kleene Closure construction examples proof of correctness –Others covered less thoroughly in lecture union,
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
CFGs and PDAs Sipser 2 (pages ). Long long ago…
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
COGN1001: Introduction to Cognitive Science Topics in Computer Science Formal Languages and Models of Computation Qiang HUO Department of Computer.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
ICE1341 Programming Languages Spring 2005 Lecture #4 Lecture #4 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
1 Semantic Description of Programming languages. 2 Static versus Dynamic Semantics n Static Semantics represents legal forms of programs that cannot be.
CFGs and PDAs Sipser 2 (pages ). Last time…
Theory of Computation What types of things are computable? How can we demonstrate what things are computable?
Languages, grammars, and regular expressions
Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.2: Pushdown Automata) Prof. Karen Daniels, Fall 2009 with acknowledgement.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
1 Module 31 Closure Properties for CFL’s –Kleene Closure construction examples proof of correctness –Others covered less thoroughly in lecture union, concatenation.
Normal forms for Context-Free Grammars
Grammars, Languages and Finite-state automata Languages are described by grammars We need an algorithm that takes as input grammar sentence And gives a.
Lee CSCE 314 TAMU 1 CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
Introduction Chapter 0. Three Central Areas 1.Automata 2.Computability 3.Complexity.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
CS490 Presentation: Automata & Language Theory Thong Lam Ran Shi.
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
CSC312 Automata Theory Lecture # 2 Languages.
1 Chapter 1 Introduction to the Theory of Computation.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Grammars CPSC 5135.
Learning Automata and Grammars Peter Černo.  The problem of learning or inferring automata and grammars has been studied for decades and has connections.
1 Introduction to Regular Expressions EELS Meeting, Dec Tom Horton Dept. of Computer Science Univ. of Virginia
1 Computability Five lectures. Slides available from my web page There is some formality, but it is gentle,
Introduction to Language Theory
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
THEORY OF COMPUTATION Komate AMPHAWAN 1. 2.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.2: Pushdown Automata) Prof. Karen Daniels, Fall 2010 with acknowledgement.
Chap2. Language Acquisition: The Problem of Inductive Inference (2.1 ~ 2.2) Min Su Lee The Computational Nature of Language Learning and Evolution.
Chapter 8 Properties of Context-free Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata,
Formal Languages and Grammars
Discrete Structures ICS252 Chapter 5 Lecture 2. Languages and Grammars prepared By sabiha begum.
Representing Languages by Learnable Rewriting Systems Rémi Eyraud Colin de la Higuera Jean-Christophe Janodet.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
CS 154 Formal Languages and Computability March 10 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak.
Theory of Languages and Automata By: Mojtaba Khezrian.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Lecture 17: Theory of Automata:2014 Context Free Grammars.
Chapter 2. Formal Languages Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Introduction to Automata Theory
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
BCT 2083 DISCRETE STRUCTURE AND APPLICATIONS
Introduction to the Theory of Computation
Lecture 1 Theory of Automata
CIS Automata and Formal Languages – Pei Wang
Complexity and Computability Theory I
Natural Language Processing - Formal Language -
Context Sensitive Grammar & Turing Machines
Course 2 Introduction to Formal Languages and Automata Theory (part 2)
CSE322 Chomsky classification
Formal Language.
A HIERARCHY OF FORMAL LANGUAGES AND AUTOMATA
Chapter 1 Introduction to the Theory of Computation
Presentation transcript:

Formal Learning Theory Michele Friend (Philosophy) and Valentina Harizanov (Mathematics)

Example Guessing strings over alphabet {a,b} a Guess? a, aa Guess? a, aa, aaa Guess? a, aa, aaa, b Guess? a, aa, aaa, b, bb Guess? a, aa, aaa, b, bb, bbb Guess? a, aa, aaa, b, bb, bbb, aba Guess? … Infinite process called Infinite process called identification in the limit

Learning paradigm Language or class of languages to be learned Learner Environment in which a language is presented to the learner Hypotheses that occur to the learner about language to be learned on the basis of the environment

Formal language L Given finite alphabet, say T={a,b} A sentence w is a string of symbols over the alphabet T: , a, aa, ab, ba, bba, …  =empty string A language L={w 0, w 1, w 2,…} is a set of correct sentences, say L 1 ={a, aa, aaa, aaaa, …} L 2 ={a, b, bab, aba, babab, …} w 0, w 1, w 2,… is a text for L (order and repetition do not matter) w 2, w 1, w 2, w 1, w 3,… another text for L

Chomsky grammar G=(T, V, S, P) T={a,b}, V={S}, P=production rules L 1 = {a, aa, aaa, …} P 1 1.S  aS 2.S  a L(G 1 )= L 1 S  a  aaS  aaaS  aaaa S  aS  aaS  aaaS  aaaa Regular grammar Finite state automaton L 2 ={a, b, bab, aba, babab, aabaa, …} P 2 1.S  aSa 2.S  bSb 3.S  a 4.S  b L(G 2 )= L 2 Context-free grammar Push-down automaton

Coding Chomsky languages Chomsky languages=computably enumerable languages Gödel coding by numbers of finite sequences of syntactic objects Code of L is e: L=L e Algorithmic enumeration of all (unrestricted) Chomsky languages L 0, L 1, L 2, …,L e,…

Decidable (computable) language A language is decidable if there is an algorithm that recognizes correct from incorrect sentences. Chomsky language is decidable exactly when the incorrect sentences form a Chomsky language. Not every Chomsky language is decidable. There is no algorithmic enumeration of all decidable languages.

Learning from text An algorithmic learner is a Turing machine being fed text for the language L to be learned, sentence by sentence. At each step the learner guesses the code for the language being fed: w 0 ; e 0 w 0, w 1 ; e 1 w 0, w 1, w 2 ; e 2 … Learning is successful if the sequence e 0, e 1, e 2, e 3, … converges to the “description” of L. converges to the “description” of L.

Syntactic convergence EX-learning EX=explanatory For some n, have e 0, e 1, …, e n, e n, e n, e n, … and The set of all finite languages is EX-learnable from text.

Semantic convergence BC-learning BC = behaviorally correct For some n, have e 0, e 1, …, e n, e n+1, e n+2, e n+3, … and There are classes of languages that are BC-learnable, but not EX-learnable

Learning from an informant L = {w 0, w 1, w 2, w 3, … } (not L)= {u 0, u 1, u 2, u 3, … } = incorrect sentences in proper vocabulary Learning steps w 0 ; e 0 w 0 ; e 0 w 0, u 0 ; e 1 w 0, u 1, w 1 ; e 2 w 0, u 1, w 1, u 2 ; e 3 w 0, u 1, w 1, u 2 ; e 3…

Locking sequence for EX-learner (Blum-Blum) If a learner can learn a language L, then there is a finite sequence  of sentences in L, called a locking sequence for L, on which the learner “locks” its correct hypothesis; that is, after that sequence the hypothesis does not change. Same hypothesis on  and ( ,  ) for any 

Angluin criterion Maximum finite fragment property Consider a class of Chomsky languages. Then the class is EX-learnable from text exactly when for every language L in the class, there is a finite fragment D of L (D ⊂ L) such that every other possibly bigger fragment U of L: D ⊆ U ⊂ L cannot be in the class.

Problem How can we formally define and study certain learning strategies? Constraints on hypotheses: consistency, confidence, reliability, etc.

Consistent learning A learner is consistent on a language L if at every step, the learner guesses a language which includes all the data given to the learner up to that point. The class of all finite languages can be identified consistently. If a languages is consistently EX-learnable by an algorithmic learner, then it must be a decidable language.

Popperian learning of total functions (Total) computable function f can be tested against finite sequences of data given to the learner. A learner is Popperian on f if on any sequence of positive data for f, the learner guesses a computable function. A learner is Popperian on f if on any sequence of positive data for f, the learner guesses a computable function. A learner is Popperian if it is Popperian on every computable function. A learner is Popperian if it is Popperian on every computable function. Not every algorithmically EX-learnable class of functions is Popperian.

Confident learning Learner is confident when it is guaranteed to converge to some hypothesis, even if it is given text for a language that does not belong to the class to be learned. Must be also accurate on the languages in the class. There is a class that is learnable by an algorithmic EX-learner, and is learnable by (another) confident EX-learner, but cannot be learned by an algorithmic and confident EX- learner.

Reliable learning Learner is reliable if it is not allowed to converge incorrectly (although might never converge on the text for a language not in the class). Reliable EX-learnability from text implies that every language in the text must be finite.

Decisive learning Learner, once it has put out a revised hypothesis for a new language, which replaces an earlier hypothesized language, never returns to the old language again. Decisive EX-learning from text not restrictive for general learners, nor for algorithmic learners of computable functions. Decisiveness reduces the power of algorithmic learning for languages.

U-shaped learning (Baliga, Case, Merkle, Stephan, and Wiehagen) Variant of non-decisive learning Mimics learning-unlearning-relearning pattern Overregularization in Language Acquisition, monograph by Markus, Pinker, Ullman, Hollande, Rosen, and Xu.

Problem How can we develop algorithmic learning theory for languages more complicated than Chomsky languages, in particular, ones closer to natural language? (Case and Royer) Correction grammars: L 1 – L 2, where G 1 is Chomsky (unrestricted, type-0; or context- free, type-2) grammar for generating language L 1 and G 2 is the one generating the editing (corrections) L 2 where G 1 is Chomsky (unrestricted, type-0; or context- free, type-2) grammar for generating language L 1 and G 2 is the one generating the editing (corrections) L 2 Burgin: “Grammars with prohibition and human-computer interaction,” Ershov’s difference hierarchy in computability theory for limit computable languages

Problem What is the significance of negative versus positive information in the learning process? Learning from switching type of information (Jain and Stephan): Learning from switching type of information (Jain and Stephan): Learner can request positive or negative information about L, but when he, after finitely many switches, requests information of the same type, he receives all of it (in the limit)

Harizanov-Stephan’s result Consider a class of Chomsky languages. Assume that there is a language L in the family such that for every finite set of sentences D, there are languages U and U′ in the family with U ⊂ L ⊂ U′ and and D ∩ U = D ∩ U′ Then the family cannot be even BC-learned from switching. Then the family cannot be even BC-learned from switching. U approximates L from below; U′ from above U and U′ coincide on D

Problem What are good formal frameworks that unify deduction and induction? Martin, Sharma and Stephan: use parametric logic (5 parameters: vocabulary, structures, language, data sentences, assumption sentences) Model theoretic approach, based on the Tarskian “truth-based” notion of logical consequence. The difference between deductive and inductive consequences lies in the process of deriving a consequence from the premises.

Deduction vs induction A sentence s is a deductive consequence of a theory T if s can be inferred from T with absolute certainty. A sentence s is an inductive consequence of a theory T if s can be correctly (only hypothetically) inferred from T, but can also be incorrectly inferred from other theories T′ that have enough in common with T to provisionally force the inference of s.