600.465 - Intro to NLP - J. Eisner1 Learning in the Limit Golds Theorem.

Slides:

Advertisements

Similar presentations

How to be a good teacher? What makes a good teacher?

Advertisements

Temper Tantrums By: Alison Anderson-Crum Early Childhood Education Lively Technical Center.

Intro to NLP - J. Eisner1 Finite-State Programming Some Examples.

Intro to NLP - J. Eisner1 Phonology [These slides are missing most examples and discussion from class …]

Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

Intro to NLP - J. Eisner1 Finite-State Methods.

Intro to NLP - J. Eisner1 Modeling Grammaticality [mostly a blackboard lecture]

Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.

Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.

The Recursion Theorem Sipser – pages Self replication Living things are machines Living things can self-reproduce Machines cannot self reproduce.

Arguments for Nativism Various other facts about child language add support to the Nativist argument: Accuracy (few ‘errors’) Efficiency (quick, easy)

Comparing Sets Size without Counting 2 sets A and B have the same size if there is a function f: A  B such that: For every x  A there is one and only.

Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.

1 Language and kids Linguistics lecture #8 November 21, 2006.

1 Lecture 4: Formal Definition of Solvability Analysis of decision problems –Two types of inputs:yes inputs and no inputs –Language recognition problem.

Theory of Computation What types of things are computable? How can we demonstrate what things are computable?

61 Nondeterminism and Nodeterministic Automata. 62 The computational machine models that we learned in the class are deterministic in the sense that the.

Chapter 9 Hypothesis Testing Testing Hypothesis about µ, when the s.t of population is known.

Psych 156A/ Ling 150: Psychology of Language Learning Lecture 11 Poverty of the Stimulus II.

Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.

Courtesy Costas Busch - RPI1 The Pumping Lemma for Context-Free Languages.

Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

Topic: Theoretical Bases for Cognitive Method Objectives Trainees will be able to give reasons for the design and procedures of the Cognitive Method.

Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.

Normal forms for Context-Free Grammars

Miscellaneous topics and advice Never ever ever ever ……… EVER ….. What you should never ever ever ever ever do Light bulbs, planters, tough experiemts,

MA/CSSE 474 Theory of Computation

CS5371 Theory of Computation Lecture 12: Computability III (Decidable Languages relating to DFA, NFA, and CFG)

Formal Learning Theory Michele Friend (Philosophy) and Valentina Harizanov (Mathematics)

INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.

The Marriage Problem Finding an Optimal Stopping Procedure.

SAT Prep: Improving Paragraphs AVID III Spring 2012.

More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.

Intermediate Statistical Analysis Professor K. Leppel.

1 1 Slide Statistical Inference n We have used probability to model the uncertainty observed in real life situations. n We can also the tools of probability.

STT 315 This lecture is based on Chapter 6. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing.

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Chapter 11 Factoring Polynomials.

Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

10.3: Use and Abuse of Tests Carrying out a z-confidence interval calculation is simple. On a TI-83, it’s STAT-->TESTS-->7 Similarly, carrying out a z-signficance.

1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.

Grammaticality Judgments Do you want to come with?20% I might could do that.38% The pavements are all wet.60% Y’all come back now.38% What if I were Romeo.

Exam 1 Median: 74 Quartiles: 68, 84 Interquartile range: 16 Mean: 74.9 Standard deviation: 12.5 z = -1: 62.4 z = -1: 87.4 z = -1z = +1 Worst Question:

Section 3.1: Proof Strategy Now that we have a fair amount of experience with proofs, we will start to prove more difficult theorems. Our experience so.

1 The Scientist Game Chris Slaughter, DrPH (courtesy of Scott Emerson) Dept of Biostatistics Vanderbilt University © 2002, 2003, 2006, 2008 Scott S. Emerson,

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Confidence intervals and hypothesis testing Petter Mostad

Statistical Significance and  (alpha) level Lesson

Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.

Inferring Finite Automata from queries and counter-examples Eggert Jón Magnússon.

Copyright © Curt Hill Finite State Automata Again This Time No Output.

Welcome to MM570 Psychological Statistics

Introduction Chapter 1 Foundations of statistical natural language processing.

Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.

Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.

Chapter 9 Day 2 Tests About a Population Proportion.

Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.

1 Chapter 6 Simplification of CFGs and Normal Forms.

Lecture 04: Theory of Automata:08 Transition Graphs.

Today: Hypothesis testing. Example: Am I Cheating? If each of you pick a card from the four, and I make a guess of the card that you picked. What proportion.

CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.

#1 Make sense of problems and persevere in solving them How would you describe the problem in your own words? How would you describe what you are trying.

Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.

If: expressing different scenarios through language

Tests About a Population Proportion

Introduction to Hypothesis Testing: The Binomial Test

BCT 2083 DISCRETE STRUCTURE AND APPLICATIONS

Chapter 9 Hypothesis Testing

Introduction to Hypothesis Testing: The Binomial Test

Top-Down Parsing.

Turing -Recognizable vs. -Decidable

Presentation transcript:

Intro to NLP - J. Eisner1 Learning in the Limit Golds Theorem

Intro to NLP - J. Eisner2 Observe some values of a function

Intro to NLP - J. Eisner3 Guess the whole function

Intro to NLP - J. Eisner4 Another guess: Just as good?

Intro to NLP - J. Eisner5 More data needed to decide

Intro to NLP - J. Eisner6 Poverty of the Stimulus Never enough input data to completely determine the polynomial … Always have infinitely many possibilities … unless you know the order of the polynomial ahead of time. 2 points determine a line 3 points determine a quadratic etc. In language learning, is it enough to know that the target language is generated by a CFG? without knowing the size of the CFG?

Intro to NLP - J. Eisner7 Language learning: What kind of evidence? Children listen to language [unsupervised] Children are corrected?? [supervised] Children observe language in context Children observe frequencies of language Remember: Language = set of strings

Intro to NLP - J. Eisner8 Poverty of the Stimulus (1957) Children listen to language Children are corrected?? Children observe language in context Children observe frequencies of language Chomsky: Just like polynomials: never enough data unless you know something in advance. So kids must be born knowing what to expect in language.

Intro to NLP - J. Eisner9 Golds Theorem (1967) Children listen to language Children are corrected?? Children observe language in context Children observe frequencies of language a simple negative result along these lines: kids (or computers) cant learn much without supervision, inborn knowledge, or statistics

Intro to NLP - J. Eisner10 The Idealized Situation Mom talks Baby listens 1. Mom outputs a sentence 2. Baby hypothesizes what the language is (given all sentences so far) 3. Goto step 1 Guarantee: Moms language is in the set of hypotheses that Baby is choosing among Guarantee: Any sentence of Moms language is eventually uttered by Mom (even if infinitely many) Assumption: Vocabulary (or alphabet) is finite.

Intro to NLP - J. Eisner11 Can Baby learn under these conditions? Learning in the limit: There is some point at which Babys hypothesis is correct and never changes again. Baby has converged! Baby doesnt have to know that its reached this point – it can keep an open mind about new evidence – but if its hypothesis is right, no such new evidence will ever come along. A class C of languages is learnable in the limit if one could construct a perfect C-Baby that can learn any language L C in the limit from a Mom who speaks L. Baby knows the class C of possibilities, but not L. Is there a perfect finite-state Baby? Is there a perfect context-free Baby?

Intro to NLP - J. Eisner12 Languages vs. Grammars Does Baby have to get the right grammar? (E.g., does VP have to be called VP?) Assumption: Finite vocabulary.

Intro to NLP - J. Eisner13 Conservative Strategy Babys hypothesis should always be smallest language consistent with the data Works for finite languages? Lets try it … Language 1: {aa,ab,ac} Language 2: {aa,ab,ac,ad,ae} Language 3: {aa,ac} Language 4: {ab} Mom Baby aa L3 ab L1 ac L1 ab L1 aa L1 …

Intro to NLP - J. Eisner14 Conservative Strategy Babys hypothesis should always be smallest language consistent with the data Works for finite languages? Lets try it … Language 1: {aa,ab,ac} Language 2: {aa,ab,ac,ad,ae} Language 3: {aa,ac} Language 4: {ab} Mom Baby aa L3 ab L1 ac L1 ab L1 aa L1 … aa ab ac ad ae

Intro to NLP - J. Eisner15 Evil Mom To find out whether Baby is perfect, we have to see whether it gets 100% even in the most adversarial conditions Assume Mom is trying to fool Baby although she must speak only sentences from L and she must eventually speak each such sentence Does Babys strategy work?

Intro to NLP - J. Eisner16 An Unlearnable Class Class of languages: Let L n = set of all strings of length < n What is L 0 ? What is L 1 ? What is L ? If the true language is L, can Mom really follow rules? Must eventually speak every sentence of L. Possible? Yes: ; a, b; aa, ab, ba, bb; aaa, aab, aba, abb, baa, … Our class is C = {L 0, L 1, … L }

Intro to NLP - J. Eisner17 An Unlearnable Class Let L n = set of all strings of length < n What is L 0 ? What is L 1 ? What is L ? Our class is C = {L 0, L 1, … L } A perfect C-baby will distinguish among all of these depending on the input. But there is no perfect C-baby …

Intro to NLP - J. Eisner18 An Unlearnable Class Our class is C = {L 0, L 1, … L } Suppose Baby adopts conservative strategy, always picking smallest possible language in C. So if Moms longest sentence so far has 75 words, babys hypothesis is L 76. This wont always work: What language cant a conservative Baby learn?

Intro to NLP - J. Eisner19 An Unlearnable Class Our class is C = {L 0, L 1, … L } Could a non-conservative baby be a perfect C- Baby, and eventually converge to any of these? Claim: Any perfect C-Baby must be quasi- conservative: If true language is L 76, and baby posits something else, baby must still eventually come back and guess L 76 (since its perfect). So if longest sentence so far is 75 words, and Mom keeps talking from L 76, then eventually baby must actually return to the conservative guess L 76. Agreed?

Intro to NLP - J. Eisner20 Moms Revenge If longest sentence so far is 75 words, and Mom keeps talking from L 76, then eventually a perfect C-baby must actually return to the conservative guess L 76. Suppose true language is L. Evil Mom can prevent our supposedly perfect C-Baby from converging to it. If Baby ever guesses L, say when the longest sentence is 75 words: Then Evil Mom keeps talking from L 76 until Baby capitulates and revises her guess to L 76 – as any perfect C-Baby must. So Baby has not stayed at L as required. Then Mom can go ahead with longer sentences. If Baby ever guesses L again, she plays the same trick again.

Intro to NLP - J. Eisner21 Moms Revenge If longest sentence so far is 75 words, and Mom keeps talking from L 76, then eventually a perfect C-baby must actually return to the conservative guess L 76. Suppose true language is L. Evil Mom can prevent our supposedly perfect C-Baby from converging to it. If Baby ever guesses L, say when the longest sentence is 75 words: Then Evil Mom keeps talking from L 76 until Baby capitulates and revises her guess to L 76 – as any perfect C-Baby must. So Baby has not stayed at L as required. Conclusion: Theres no perfect Baby that is guaranteed to converge to L 0, L 1, … or L as appropriate. If it always succeeds on finite languages, Evil Mom can trick it on infinite language.

Intro to NLP - J. Eisner22 Implications We found that C = {L 0, L 1, … L } isnt learnable in the limit. How about class of finite-state languages? Not unless you limit it further (e.g., # of states) After all, it includes all languages in C, and more, so learner has harder choice How about class of context-free languages? Not unless you limit it further (e.g., # of rules)

Intro to NLP - J. Eisner23 Punchline But class of probabilistic context-free languages is learnable in the limit!! If Mom has to output sentences randomly with the appropriate probabilities, shes unable to be too evil there are then perfect Babies that are guaranteed to converge to an appropriate probabilistic CFG I.e., from hearing a finite number of sentences, Baby can correctly converge on a grammar that predicts an infinite number of sentences. Baby is generalizing! Just like real babies!

Intro to NLP - J. Eisner24 Perfect fit to perfect, incomplete data

Intro to NLP - J. Eisner25 Imperfect fit to noisy data Will an ungrammatical sentence ruin baby forever? (yes, under the conservative strategy...) Or can baby figure out which data to (partly) ignore? Statistics can help again... how?