Ch 4. Language Acquisition: Memoryless Learning 4.1 ~ 4.3 The Computational Nature of Language Learning and Evolution Partha Niyogi 2004 Summarized by.

Slides:



Advertisements
Similar presentations
Markov chains Assume a gene that has three alleles A, B, and C. These can mutate into each other. Transition probabilities Transition matrix Probability.
Advertisements

3D Geometry for Computer Graphics
Chapter 9 Approximating Eigenvalues
. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
1 A class of Generalized Stochastic Petri Nets for the performance Evaluation of Mulitprocessor Systems By M. Almone, G. Conte Presented by Yinglei Song.
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Lecture 17 Introduction to Eigenvalue Problems
1 Markov Chains (covered in Sections 1.1, 1.6, 6.3, and 9.4)
11 - Markov Chains Jim Vallandingham.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Markov Chains Lecture #5
1 CE 530 Molecular Simulation Lecture 8 Markov Processes David A. Kofke Department of Chemical Engineering SUNY Buffalo
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
By : L. Pour Mohammad Bagher Author : Vladimir N. Vapnik
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Evaluating Hypotheses
Link Analysis, PageRank and Search Engines on the Web
EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
1 Modeling Parameter Setting Performance in Domains with a Large Number of Parameters: A Hybrid Approach CUNY / SUNY / NYU Linguistics Mini-conference.
DynaTraffic – Models and mathematical prognosis
Dominant Eigenvalues & The Power Method
The effect of New Links on Google Pagerank By Hui Xie Apr, 07.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Final Exam Review II Chapters 5-7, 9 Objectives and Examples.
Summarized by Soo-Jin Kim
Presented By Wanchen Lu 2/25/2013
Biointelligence Laboratory, Seoul National University
9. Convergence and Monte Carlo Errors. Measuring Convergence to Equilibrium Variation distance where P 1 and P 2 are two probability distributions, A.
Dynamical Systems Model of the Simple Genetic Algorithm Introduction to Michael Vose’s Theory Rafal Kicinger Summer Lecture Series 2002.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Computing Eigen Information for Small Matrices The eigen equation can be rearranged as follows: Ax = x  Ax = I n x  Ax - I n x = 0  (A - I n )x = 0.
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Sampling and estimation Petter Mostad
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
The Computational Nature of Language Learning and Evolution 10. Variations and Case Studies Summarized by In-Hee Lee
Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.
Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Ch 2. The Probably Approximately Correct Model and the VC Theorem 2.3 The Computational Nature of Language Learning and Evolution, Partha Niyogi, 2004.
Ch 5. Language Change: A Preliminary Model 5.1 ~ 5.2 The Computational Nature of Language Learning and Evolution P. Niyogi 2006 Summarized by Kwonill,
Chapter 9. A Model of Cultural Evolution and Its Application to Language From “The Computational Nature of Language Learning and Evolution” Summarized.
PROBABILITY AND COMPUTING RANDOMIZED ALGORITHMS AND PROBABILISTIC ANALYSIS CHAPTER 1 IWAMA and ITO Lab. M1 Sakaidani Hikaru 1.
Biointelligence Laboratory, Seoul National University
Biointelligence Laboratory, Seoul National University
Summarized by In-Hee Lee
Markov Chains Mixing Times Lecture 5
Sample Mean Distributions
The Computational Nature of Language Learning and Evolution
Unfolding Problem: A Machine Learning Approach
Biointelligence Laboratory, Seoul National University
Feature space tansformation methods
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
Ch 6. Language Change: Multiple Languages 6.1 Multiple Languages
Chapter 5 Language Change: A Preliminary Model (2/2)
Biointelligence Laboratory, Seoul National University
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Unfolding with system identification
Biointelligence Laboratory, Seoul National University
Presentation transcript:

Ch 4. Language Acquisition: Memoryless Learning 4.1 ~ 4.3 The Computational Nature of Language Learning and Evolution Partha Niyogi 2004 Summarized by M.-O. Heo Biointelligence Laboratory, Seoul National University

2 (C) 2009, SNU Biointelligence Lab, Contents 4.1 Characterizing Convergence Times for the Markov Chain Model  Some Transition Matrices and Their Convergence Curves  Absorption Times  Eigenvalue Rates of Convergence 4.2 Exploring Other Points  Changing the Algorithm  Distributional Assumptions  Natural Distributions – CHILDES CORPUS 4.3 Batch Learning Upper and Lower Bounds: An Aside

Markov Formulation Parameterized grammar family with 3 parameters  Target language  Absorbing State  Loop to itself  No exit arcs  Closed set of states  No arc from any states in set  Absorbing set is a closed set with one state

The Markov Chain for the Three-parameter Example

Markov Chain Criteria for Learnability  Gold learnable ↔ every closed set includes the target state

Some Transition Matrices and Their Convergence Curves (1/3) Markov chain formulation for learning 3-parameter grammar using degree-0 strings in L 5  The transition matrix from L i to L j 6 (C) 2009, SNU Biointelligence Lab, Local maxima exist

Some Transition Matrices and Their Convergence Curves (2/3)  Probability matrix in the limit by running m times as m goes to infinity  If initial states are from L5 ~ L8, converge to L5 successfully to target grammar  If initial state are from L2 or L4, converge to the other maximum, L2  If initial states are from L1 or L3, fail to converge 7 (C) 2009, SNU Biointelligence Lab,

Some Transition Matrices and Their Convergence Curves (3/3) One example without local maxima problem.  This rate allows us to bound the sample complexity. 8 (C) 2009, SNU Biointelligence Lab,

Absorption Times Absorption Time  Given an initial state, the time taken to reach the absorption state is a random variable.  If the target language is L 1, transition matrix has the form:  Mean  Variance  Using the above, we can get the statistics of the absorption time from the most unfavorable initial state of the learner. 9 (C) 2009, SNU Biointelligence Lab,

Eigenvalue Rates of Convergence (1/5) For transition matrices corresponding to finite Markov chains as an eigenvalue problem  It is possible to show that λ = 1 is always an eigenvalue. Further, it is the largest eigenvalue in that any other eigenvalue is less than 1 in absolute value, i.e., | λ| < 1  The multiplicity of the eigenvalue λ = 1 is equal to the number of closed classes in the chain. 10 (C) 2009, SNU Biointelligence Lab,

Eigenvalue Rates of Convergence (2/5) Representation of T k  Let T be an m × m transition matrix. Let it have m linearly independent eigenvectors corresponding to eigenvalues  Define L and L (C) 2009, SNU Biointelligence Lab, SVD (Singular Value Decomposition)

Eigenvalue Rates of Convergence (3/5)  Initial Conditions and Limiting Distributions  We could quantify the initial condition of the learner by putting a distribution on the states of the Markov chain according to which the learner picks its initial state. Let this be denoted by the row vector  After k examples, the probability with which the learner would be in each of the states is given by  Limiting distribution as 12 (C) 2009, SNU Biointelligence Lab,

Eigenvalue Rates of Convergence (4/5)  Rate of Convergence  This rate depends on the rate at which T k converges to T ∞  We can bound the rate of convergence by the rate at which the second largest eigenvalue converges to (C) 2009, SNU Biointelligence Lab,

Eigenvalue Rates of Convergence (5/5) Transition Matrix Recipes 14 (C) 2009, SNU Biointelligence Lab,

Changing the Algorithm (Variants of TLA)  Random walk with neither greediness nor single value constraints  If a new sentence analyzable, it remains in that state.  If not, the learner moves uniformly at random to any of the other states and stays there waiting for the next sentence. This is done without regard to whether the new state allows the sentence to be analyzed.  Random walk with no greediness but with single-value constraint  If a new sentence analyzable, it remains in its original state.  If not, the learner choose one of the parameters uniformly at random and flip it (moving to an adjacent state in the Markov structure). Again, this is done without regard to whether the new state allows the sentence to be analyzed.  Random walk with no single value constraint but with greediness  If a new sentence analyzable, it remains in its original state.  If not, the learner moves uniformly at random to any of the other states and stays there iff the sentence can be analyzed. If the sentence cannot be analyzed in the new state, the learner remains in its original state. 15 (C) 2009, SNU Biointelligence Lab,

Reminder of TLA Triggering learning algorithm (TLA)

17 (C) 2009, SNU Biointelligence Lab,

Distributional Assumptions The convergence times depend on the distribution of example data  The distribution-free convergence time for the 3-parameter system is infinite.  We can use parameterized distribution  Each of the sets A, B, C and D contain different degree-0 sentences of L 1. The elements of each defined subset of L1 are equally likely w.r.t each other.  The sample complexity can not be bound in a manner that is distribution-free, because by choosing a highly unfavorable distribution the sample complexity can be made as high as possible. 18 (C) 2009, SNU Biointelligence Lab,

19 (C) 2009, SNU Biointelligence Lab,

Natural Distribution–CHILDES CORPUS Examining the fidelity of the model using real language distribution. CHILDES database (MacWhinney 1996)  43,612 for English sentences  632 for German sentences  Consider input patterns SVO, S Aux V, and so on, as appropriate for the target language.  Sentences not parsable into these patterns were discarded. 20 (C) 2009, SNU Biointelligence Lab,

 Discovering that convergence falls roughly along the TLA convergence time – roughly 100 examples to asymptote.  The feasibility of the basic model is confirmed by actual caretaker input, at least in this simple case, for both English and German.  One must add patterns to cover the predominance of auxiliary inversions and wh-questions.  As far as we can tell, we have not yet arrived at a satisfactory parameter-setting account for V2 acquisition. 21 (C) 2009, SNU Biointelligence Lab,

Batch Learning Upper and Lower Bounds: An Aside Consider upper and lower bounds for learning finite language families if the learner was allowed to remember all the strings encountered and optimize over them.  There are n language over an alphabet Σ. Each language can be represented as a subset of Σ*  The learner is provided with positive data drawn according to distribution P on the strings of a particular target language.  Goal: To identify the target  Q: how many samples the learner needs to see so that with high confidence it is able to identify the target. 22 (C) 2009, SNU Biointelligence Lab,

A lower bound of the number of samples to be able to identify the target. 23 (C) 2009, SNU Biointelligence Lab,

An upper bound of the number of samples that are sufficient to guarantee identification with high confidence. 24 (C) 2009, SNU Biointelligence Lab,