Chap2. Language Acquisition: The Problem of Inductive Inference (2.1 ~ 2.2) Min Su Lee The Computational Nature of Language Learning and Evolution.

Slides:



Advertisements
Similar presentations
PAC Learning 8/5/2005. purpose Effort to understand negative selection algorithm from totally different aspects –Statistics –Machine learning What is.
Advertisements

1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
1. Introduction Consistency of learning processes To explain when a learning machine that minimizes empirical risk can achieve a small value of actual.
Instructor : Dr. Saeed Shiry
Introduction to Computability Theory
CPSC 411, Fall 2008: Set 12 1 CPSC 411 Design and Analysis of Algorithms Set 12: Undecidability Prof. Jennifer Welch Fall 2008.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
1 Undecidability Andreas Klappenecker [based on slides by Prof. Welch]
Regular Languages Sequential Machine Theory Prof. K. J. Hintz Department of Electrical and Computer Engineering Lecture 3 Comments, additions and modifications.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Probably Approximately Correct Model (PAC)
Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Evaluating Hypotheses
CHAPTER 4 Decidability Contents Decidable Languages
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Normal forms for Context-Free Grammars
Experimental Evaluation
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
1 Introduction to Computability Theory Lecture11: The Halting Problem Prof. Amos Israeli.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Formal Learning Theory Michele Friend (Philosophy) and Valentina Harizanov (Mathematics)
Statistical Hypothesis Testing. Suppose you have a random variable X ( number of vehicle accidents in a year, stock market returns, time between el nino.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Evolution of Universal Grammar Pia Göser Universität Tübingen Seminar: Sprachevolution Dozent: Prof. Jäger
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Machine Learning CSE 681 CH2 - Supervised Learning.
Learning DFA from corrections Leonor Becerra-Bonache, Cristina Bibire, Adrian Horia Dediu Research Group on Mathematical Linguistics, Rovira i Virgili.
1 Chapter 1 Automata: the Methods & the Madness Angkor Wat, Cambodia.
Multiway Trees. Trees with possibly more than two branches at each node are know as Multiway trees. 1. Orchards, Trees, and Binary Trees 2. Lexicographic.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Learning Automata and Grammars Peter Černo.  The problem of learning or inferring automata and grammars has been studied for decades and has connections.
Copyright © Cengage Learning. All rights reserved. CHAPTER 7 FUNCTIONS.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
T. Poggio, R. Rifkin, S. Mukherjee, P. Niyogi: General Conditions for Predictivity in Learning Theory Michael Pfeiffer
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
1 Turing’s Thesis. 2 Turing’s thesis: Any computation carried out by mechanical means can be performed by a Turing Machine (1930)
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Machine Learning Concept Learning General-to Specific Ordering
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Probabilistic Automaton Ashish Srivastava Harshil Pathak.
1 EE571 PART 3 Random Processes Huseyin Bilgekul Eeng571 Probability and astochastic Processes Department of Electrical and Electronic Engineering Eastern.
Classifications LanguageGrammarAutomaton Regular, right- linear Right-linear, left-linear DFA, NFA Context-free PDA Context- sensitive LBA Recursively.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
5.3 Algorithmic Stability Bounds Summarized by: Sang Kyun Lee.
Rate Distortion Theory. Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a.
Information Technology Michael Brand Joint work with David L. Dowe 8 February, 2016 Information Technology.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Computacion Inteligente Least-Square Methods for System Identification.
The Computational Nature of Language Learning and Evolution 10. Variations and Case Studies Summarized by In-Hee Lee
Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.
Ch 4. Language Acquisition: Memoryless Learning 4.1 ~ 4.3 The Computational Nature of Language Learning and Evolution Partha Niyogi 2004 Summarized by.
Ch 2. The Probably Approximately Correct Model and the VC Theorem 2.3 The Computational Nature of Language Learning and Evolution, Partha Niyogi, 2004.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Biointelligence Laboratory, Seoul National University
CIS Automata and Formal Languages – Pei Wang
Copyright © Cengage Learning. All rights reserved.
HIERARCHY THEOREMS Hu Rui Prof. Takahashi laboratory
Ch 6. Language Change: Multiple Languages 6.1 Multiple Languages
Turing Machines Everything is an Integer
Complexity Theory: Foundations
Presentation transcript:

Chap2. Language Acquisition: The Problem of Inductive Inference (2.1 ~ 2.2) Min Su Lee The Computational Nature of Language Learning and Evolution

Contents 2.1 A framework for learning  Remarks 2.2 The inductive inference approach  Discussion  Additional Results (C) 2009, SNU Biointelligence Lab, 2

Introduction Problem of language acquisition  The computational difficulty of the problem that children solve so routinely  Child is exposed to a finite number of sentences as result of interaction with its linguistic environment.  The child can generalize them and thus infer novel sentences it has not encountered before. Consider a language to be a set of sentences  Σ: a finite alphabet, Σ * : the universe of all possible finite sentences  L t ⊂ Σ * : Target language  s 1, s 2, s 3,..., s n,... (where s i ∈ L t is the ith example available to the learner): Children learner ultimately has access to a sequence of sentences  The learner makes a guess about the target language L t after each new sentence becomes available  There are an infinite number of possible languages that contain s 1,...,s k  Learning in the complete absence of prior information is impossible (C) 2009, SNU Biointelligence Lab, 3

A framework for learning Target language L t ∈ L  A target language drawn from a class of possible target languages ( L )  This is the language the learner must identify on the basis of examples Example sentences s i ∈ L t  Drawn from the target language and presented to the learner  s i is the ith such example sentence Hypothesis language h ∈ H  Drawn from a class of possible hypothesis languages that child learners construct on the basis of exposure to example sentences in the environment Learning algorithm A  An effective procedure by which languages from H are chosen on the basis of example sentences received by the learner (C) 2009, SNU Biointelligence Lab, 4

A framework for learning - Remarks 1. Languages and grammars  Recursively enumerable (r.e.) languages  Has a potentially infinite number of sentences and has a finite representation in terms of a Turing Machine or a Phrase Structure Grammar   Language, machines, and grammars are formally equivalent ways of specifying the same set.  G : possible target grammars, L = {L g | g ∈ G }  All computable Phrase Structure Grammars may be enumerated as g 1, g 2,...  Given any r.e. language L there are an infinite number of g i ’s s.t. L g i = L  Then any collection of grammars may be defined by specifying their indices in an acceptable enumeration (C) 2009, SNU Biointelligence Lab, 5

A framework for learning - Remarks 2. Example sentences  Psychologically plausible learning algorithm converges to the target in a manner that is independent of the order of presentation of sentences  Examples presents in i.i.d. fashion according to a fixed, underlying, probability distribution μ  μ characterizes the relative frequency of different kinds of sentences that children are likely to encounter during language acquisition  e.g. children are more likely to hear short sentences than substantially embedded longer sentences  μ might have support only over L t, in which case only positive examples are presented to the learner (C) 2009, SNU Biointelligence Lab, 6

A framework for learning - Remarks 3. Learning algorithm  An effective procedure allowing the learning child to construct hypotheses about the identity of the target language on the basis of the examples it has received  Mapping from the set of all finite data streams to hypotheses in H  (s 1, s 2,..., s k ): a particular finite data stream of k example sentences  D k = {(s 1,..., s k )|s i ∈ Σ * } = (Σ * ) k : the set of all possible sequences of k example sentences that the learner might potentially receive  D = ∪ k>0 D k : the set of all finite data sequences  H : the enumerable set of hypothesis grammars (languages)  A : a partial recursive function s.t. A : D  H  Given a data stream t ∈ D, the learner’s hypothesis is given by A (t) and is a element of H (C) 2009, SNU Biointelligence Lab, 7

A framework for learning - Remarks 3. Learning algorithm (cont.)  The behavior of the learner for a particular data stream depends only on the data stream and can be predicted either deterministically or probabilistically if the learning algorithm is analyzable  Some kinds of learning procedure.  A consistent learner always maintains a hypothesis (call it h n after n examples) that is consistent with the entire data set it has received so far –If the data set the learner has received t n =(s 1,..., s n ), then the learner’s grammatical hypothesis h n is s.t. each s i ∈ L h n –An empirical risk minimizing learner uses the following procedures: Risk function R(h,(s 1,...,s n )) measures the fit to the empirical data consisting of the examples sentences If not unique, the learner might be conjecture the smallest or simplest grammar that fits the data (C) 2009, SNU Biointelligence Lab, 8

A framework for learning - Remarks 3. Learning algorithm (cont.)  Some kinds of learning procedure. (cont.)  A memoryless learning algorithm is one whose hypothesis at every point depends only on the current data and the previous hypothesis –A (t n ) depends only upon A (t n-1 ) and S n  Learning by enumeration –The learner enumerates all possible grammars in H in some order –Let this enumeration be h 1, h 2,.... –It then begins with the conjecture h 1 –Upon receiving new example sentences, the learner simply goes down this list and updates its conjecture to the first one that is consistent with the data seen so far (C) 2009, SNU Biointelligence Lab, 9

A framework for learning - Remarks 4. Criterion of success  Measure how well the learner has learned at any point in the learning process  Learnability d: distance measure, g t : any target grammar, h: any hypothesis grammar, n: # of sentences  Learner’s hypothesis converges to the target in the limit  The family G (g t ∈ G ) is said to be learnable by A (learning algorithm)  Learnability implies that the generalization error eventually converges to zero as the number of examples goes to infinity –Generalization error The quantity d(g t, h n ) that measures the distance of the learner’s hypothesis (after n examples) to the target (C) 2009, SNU Biointelligence Lab, 10

A framework for learning - Remarks 5. Generality of the framework  The basic framework presented here for the analysis of learning systems is quite general  The target and hypothesis classes L and H might consist grammars in a generative linguistics tradition  Learning algorithm could in principle be grammatical inference procedures, gradient descent schemes, Minimum Description Length (MDL) learning, maximum-likelihood learning via the EM algorithm, and so on. Consider the case in which  The hypothesis class H is equal to the target class G (C) 2009, SNU Biointelligence Lab, 11

The inductive inference approach Definition 2.1  A text t for a language L is an infinite sequence of examples s 1,..., s n,... s.t.  (1) each s i ∈ L  (2) every element of L appears at least once in t Notations  t(k): the kth element of the text (s k )  t k : the first k elements of the text (s 1,..., s k )  t k ∈ D k (if represented as a k-tuple) (C) 2009, SNU Biointelligence Lab, 12

The inductive inference approach Definition 2.2  Fix a distance metric d, a target grammar g t ∈ G, and a text t for the target language (L g t ).  The language algorithm A identifies (learns) the target g t (L g t ) on the text t in the limit if  If the target grammar g t is identified (learned) by A for all texts of L g t, the learning algorithm is said to identify (learn) g t in the limit.  If all grammars in G can be identified (learned) in the limit, then the class of grammars G (correspondingly class of language L ) is said to be identifiable (learnable) in the limit.  Thus learnability is equivalent to identification in the limit (C) 2009, SNU Biointelligence Lab, 13

The inductive inference approachThe inductive inference approach Definition 2.3  Given a finite sequence x = s 1, s 2,..., s n (of length n), I denote the length of the sequence by lh(x) = n  For such a sequence s, x ⊆ L: each s i in x is contained in the language L  range (x): the set of all unique elements of x  x ⊆ L is shorthand for range(x) ⊆ L  x ◦ y=x 1,...,x n,y 1,...,y n : the concatenation of two sequences x=x 1,...,x n and y=y 1,...,y m (C) 2009, SNU Biointelligence Lab, 14

The inductive inference approachThe inductive inference approach Theorem 2.1 (after Blum and Blum; ε-version)  If A identifies a target grammar g in the limit, then, for every ε>0, there exists a locking data set l ε ∈ D s.t.  (1) l ε ⊆ L g  (2) d( A (l ε ), g) < ε  (3) d( A (l ε ◦ σ), g) < ε for all σ ∈ D where σ ⊆ L g  In other words, after encountering the locking data set, the learner will be ε-close to the target as long as it continues to be given sentences from the target language.  This suggests that if a grammar g (correspondingly, a language L g ) is identifiable (learnable) in the limit, a locking data set exists that “locks” the learner’s conjectures to within an ε-ball of the target grammar after encountering this locking data set (C) 2009, SNU Biointelligence Lab, 15

The inductive inference approachThe inductive inference approach Proof of Theorem 2.1 (C) 2009, SNU Biointelligence Lab, 16

The inductive inference approachThe inductive inference approach Considering the important and classical case of exactly identifying the target language in the limit Theorem 2.2 (Blum and Blum 1975)  If A identifies a target grammar g in the limit, then there exists a locking data set l ∈ D such that  (1) l ⊆ L g  (2) d(A(l), g)=0  (3) d(A(l ◦ σ), g)=0 for all σ ∈ D where σ ⊆ L g (C) 2009, SNU Biointelligence Lab, 17

The inductive inference approachThe inductive inference approach Theorem 2.3 (Gold 1967)  If the family L consists of all the finite languages and at least one infinite language, then it is not learnable (identifiable) in the limit (C) 2009, SNU Biointelligence Lab, 18

The inductive inference approachThe inductive inference approach Corollary 2.1  The family of languages represented by  (1) deterministic finite automata  (2) context free grammars  are not identifiable in the limit  ( ∵ Since both regular and context free languages contain all the finite languages and many infinite ones)  All grammatical families in the core Chomsky hierarchy of grammars are unlearnable in this sense (C) 2009, SNU Biointelligence Lab, 19

The inductive inference approachThe inductive inference approach Theorem 2.4 (Angluin 1980)  The family L is learnable (identifiable ) if and only if for each L ∈ L, there is a subset D L such that if L’ ∈ L contains D L,. then L’ is not a proper subset of L (C) 2009, SNU Biointelligence Lab, 20

The inductive inference approachThe inductive inference approach (C) 2009, SNU Biointelligence Lab, 21

The inductive inference approach – Discussion Remark 1: Difficulty of inferring a set from examples of members of this set  Children are not exposed directly to the grammar  They only exposed only to the expressions of their language and their task is to learn the grammar that provides a compact encoding of the ambient language they are exposed to. Remark 2: The precise notion of convergence depends upon the distance metric d  Case 1. d(g 1, g 2 ): 0-1 valued and depends on whether L g 1 =L g 2  In this metric, learner may converge on the correct extensional set but not converge to correct grammar  Case 2. d(g i, g j ) = | i – j | or | C(i) – C(j) | where C(i) and C(j) are measures of grammatical complexity in some senses  This notion of convergence is significantly more stringent and much less is learnable (C) 2009, SNU Biointelligence Lab, 22

The inductive inference approach - Discussion Remark 3. The proper role of language is to mediate a mapping between form and meaning or symbol and referent  Reduce a language to a mapping from Σ 1 * to Σ 2 *, where Σ 1 * is the set of all linguistic expressions (over some alphabet Σ 1 ) and Σ 2 * is a characterization of the set of all possible meanings  A language L is regarded as a subset of Σ 1 * ×Σ 2 * (form-meaning pairs)  Possible formulations of the notion of a language  1. L ⊂ Σ*: this is the central and traditional notion of a language  2. L ⊂ Σ 1 * ×Σ 2 * : a subset of form-meaning pairs and in a formal sense no different from notion 1 (C) 2009, SNU Biointelligence Lab, 23

The inductive inference approach - Discussion Remark 3. The proper role of language is to mediate a mapping between form and meaning or symbol and referent  Possible formulation of the notion of a language (cont.)  3. L: Σ*  [0, 1]: a language maps every expression to a real number between 0 and 1. –For any expression s ∈ Σ*, the number of L(s) characterizes the degree of well-formedness of that expression with L(s)=1 denoting perfect grammaticality and L(s)=0 denoting complete lack of it  4. L is a probability distribution μ on Σ* – this is the usual notion of a language in statistical language modeling  5. L is a probability distribution μ on Σ 1 * ×Σ 2 *  Extended notions of a language makes the learning problem for the child harder than rather than easier (C) 2009, SNU Biointelligence Lab, 24

The inductive inference approach - Discussion Remark 4. If the space L = H of possible language is too large, then the family is no longer learnable  Regular languages (DFA) is too large for learnability Remark 5. Is the class of infinite regular languages unlearnable?  Consider two languages L 1 and L 2 s.t. L 1 ⊂ L 2 and L 2 \L 1 is an infinite set.  Clearly one may find two such languages in L ={infinite regular languages}.  Let σ 2 be a locking sequence for L 2.  Then clearly, L 1 ∪ range(σ 2 ) is a language that contains σ 2, is a proper subset of L 2, and is contained in L.  This language will not be learnable from a text whose prefix is σ 2 because the learner will lock on to L 2 on such a text (C) 2009, SNU Biointelligence Lab, 25

The inductive inference approach - Discussion Remark 6. The most compelling objection to the classical inductive inference paradigm comes from statistical quarters  It seems unreasonable to expect the learner to exactly identify the target on every single text  Probable Approximately Correct (PAC) learning framework tries to learn the target approximately with high probability.  The quantity d(g t, h n ) is now a random variable that must converge to 0 (C) 2009, SNU Biointelligence Lab, 26

The inductive inference approach - Additional results Assumption  The text is generated in i.i.d. fashion according to a probability measure μ on the sentences of the target language L.  We’ll adopt this assumption in order to derive probabilistic bounds on the performance of the language learner Definition on the product spaces  Measure μ 2 on the product space L×L  All texts t from the language L that have generated according to i.i.d. draws from μ will be such that t 2 ∈ L×L  Measure μ 3 on the product space L×L×L and so on.  By the Kolmogorov Extension Theorem, a unique measure μ ∞ is guaranteed to exist on the set T=Π i=1 ∞ L i (where L i =L for each i).  The set T consists of all texts that may be generated from L by i.i.d. draws according to μ  The measure μ ∞ is defined on T and thus we have a measure on the set of all texts (C) 2009, SNU Biointelligence Lab, 27

The inductive inference approach - Additional results Theorem 2.5  Let A be an arbitrary learning algorithm and g be an arbitrary (not necessary to target) grammar. Then the set of texts on which the learning algorithm A converges to g is measurable. (C) 2009, SNU Biointelligence Lab, 28

The inductive inference approach - Additional results Definition 2.4  Let g be a target grammar and texts be presented to the learner in i.i.d. fashion according to a probability measure μ on L g. If there exists a learning algorithm A s,t, μ ∞ ({t | lim k→∞ d( A (t k ),g)=0}) = 1 then the target is said to be learnable with measure 1.  The family G is said to be learnable with measure 1 if all grammars in G are learnable with measure 1 by some algorithm A Notes  1. If the measure μ is known in a certain sense, the entire family of r.e. sets becomes learnable with measure 1.  2. On the other hand, if μ is unknown, the Superfinite language (those having all the finite languages and at least one infinite language) are not learnable. Thus, the class of learnable languages is not enlarged for distribution-free learning in a stochastic setting.  3. Computational distributions make languages learnable. Thus any collection of computable distributions is identifiable in the limit. (C) 2009, SNU Biointelligence Lab, 29

The inductive inference approach - Additional results Consider the set of r.e. languages  L 1, L 2, L 3,...: an enumeration of r.e. languages  Measure μ i : associated with L i  If L i is to be the target language, then examples are drawn in i.i.d. fashion according to measure μ i   A natural measure μ i,∞ exists on the set of texts for L i Theorem 2.6  With strong prior knowledge about the natural of the μ i ’s, the family L of r.e. languages is measure 1 learnable. (C) 2009, SNU Biointelligence Lab, 30

The inductive inference approach - Additional results (C) 2009, SNU Biointelligence Lab, 31 Proof of Theorem 2.6 (Cont.)

The inductive inference approach - Additional results (C) 2009, SNU Biointelligence Lab, 32 Proof of Theorem 2.6 (Cont.)

The inductive inference approach - Additional results (C) 2009, SNU Biointelligence Lab, 33 Proof of Theorem 2.6 (Cont.)

The inductive inference approach - Additional results Definition 2.5  Consider a target grammar g and a text stochastically presented by i.i.d. draws from the target language L g according to a measure μ.  If a learning algorithm exists that can learn the target grammar with measure 1 for all measures μ, then g is said to be learnable in a distribution-free sense.  A family of grammars G is learnable in a distribution-free sense if there exists a learning algorithm that can learn every grammar in the family with measure 1 in a distribution-free sense. (C) 2009, SNU Biointelligence Lab, 34

The inductive inference approach - Additional results Note  When one considers statistical learning, the distribution-free requirement is the natural one and all statistical estimation algorithms are required to converge in a distribution-free sense.  When this restriction is imposed, the class of learnable families is not enlarged. Theorem 2.7 (Angluin 1988)  If a family of grammars G is learnable with measure 1 (on almost all texts) in a distribution-free sense, then it is learnable in the limit in the Gold sense (on all texts). Theorem 2.8 (Pitt 1989)  If a family of grammars G is learnable with measure p>1/2 then it is learnable in the limit in the Gold sense. (C) 2009, SNU Biointelligence Lab, 35

The inductive inference approach - Additional results A few positive results on the learning of grammatical families  1. Conditions the family M ( a collection of measures) becomes identifiable in the limit : uniformly computable distribution  Let M ={μ 0, μ 1, μ 2,...} be a computable family of distributions.  Define the distance between two distributions μ i and μ j as  The family M is said to be uniformly computable if there exists a total recursive function f(i,x,ε) s.t. for every i, for every x ∈ Σ*, and for every ε, f(i,x,ε) outputs a rational number p s.t. |μ i (x)-p|<ε.  The learner receives a text probabilistically drawn according to an unknown target measure from the family M.  After k examples are received, the learning algorithm guesses A (t k ) ∈ M.  Then, construct a learning algorithm that has the property (C) 2009, SNU Biointelligence Lab, 36

The inductive inference approach - Additional results A few positive results on the learning of grammatical families (cont.)  2. Consider active learning on the part of the learner  The learning is allowed to make queries about the membership of arbitrary elements x ∈ Σ* (membership queries).  This allows the regular languages to be learned in polynomial time though context-free grammars remain unlearnable  It is certainly reasonable to consider the possibility that children explore the environment and this active exploration facilitates learning and circumvents some of the intractability inherent in inductive inference (C) 2009, SNU Biointelligence Lab, 37

The inductive inference approach - Additional results A few positive results on the learning of grammatical families (cont.)  3. Consider further restrictions on the set of texts on which the learner is required to succeed.  (a) Recursive texts –A text t is said to be recursive if {t n | n ∈ N} is recursive. –If a computable map (from data sets to grammars) exists that can learn a family of languages L from recursive texts, then L is algorithmically learnable from all texts. –Restricting learnability to recursive texts does not enlarge the family of learnable languages. (C) 2009, SNU Biointelligence Lab, 38

The inductive inference approach - Additional results A few positive results on the learning of grammatical families (cont.)  3. Further restrictions on the set of texts (cont.)  (b) Ascending texts –A text t is said to be ascending if for all n<m, the length of t(n) is less than or equal to the length of t(m), i. e. sentences are presented in increasing order of length. –There are language families L that are learnable from ascending texts but not learnable from all texts. –Superfinite families remain unlearnable in this setting (C) 2009, SNU Biointelligence Lab, 39

The inductive inference approach - Additional results A few positive results on the learning of grammatical families (cont.)  3. Further restrictions on the set of texts (cont.)  (c) Informant texts –A text is said to be an informant if it consists of both positive and negative examples. –Every element of Σ* appears in the text with a label indicating whether it belongs to the target language or not. –All recursively enumerable sets are learnable from informant texts. –However, it seems unlikely that the learning child ever gets an opportunity to sample the space of negative examples with enough coverage to get an unbiased estimate of the target language. (C) 2009, SNU Biointelligence Lab, 40

The inductive inference approach - Additional results A few positive results on the learning of grammatical families (cont.)  4. Consider weaker convergence criteria  e.g. Framework of anomalies where one is required to learn the target language up to (at most) k mistakes  5. Consider various ways to incorporate structure into the learning problem leading to learnability results.  e.g. Learning context-free grammars from structured examples, Learning categorical grammars (C) 2009, SNU Biointelligence Lab, 41

Summary Provide the central developments and results of the theory of inductive inference that continues to provide the basic formal framework to reason about language acquisition The main implication of these results is that learning in the complete absence of any prior information is infeasible. (C) 2009, SNU Biointelligence Lab, 42