Analyzing iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.

Slides:

Advertisements

Similar presentations

Exact Inference in Bayes Nets

Advertisements

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.

Bayesian Estimation in MARK

Gibbs Sampling Qianji Zheng Oct. 5th, 2010.

Causes and coincidences Tom Griffiths Cognitive and Linguistic Sciences Brown University.

The loss function, the normal equation,

11 - Markov Chains Jim Vallandingham.

Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.

CHAPTER 16 MARKOV CHAIN MONTE CARLO

Lecture 3: Markov processes, master equation

1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

Suggested readings Historical notes Markov chains MCMC details

BAYESIAN INFERENCE Sampling techniques

CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov

Part IV: Monte Carlo and nonparametric Bayes. Outline Monte Carlo methods Nonparametric Bayesian models.

The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman.

. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:

Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.

Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.

A Bayesian view of language evolution by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.

Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.

Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.

Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky.

Markov chain Monte Carlo with people Tom Griffiths Department of Psychology Cognitive Science Program UC Berkeley with Mike Kalish, Stephan Lewandowsky,

Exploring cultural transmission by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana With thanks to: Anu Asnaani, Brian.

Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-

Normative models of human inductive inference Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.

Approximate Inference 2: Monte Carlo Markov Chain

6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.

Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.

Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,

Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.

Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future Optimality and Bayesian inference Results.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.

Markov Random Fields Probabilistic Models for Images

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Lecture 2: Statistical learning primer for biologists

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.

Introduction to Models Lecture 8 February 22, 2005.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Introduction to Sampling Methods Qi Zhao Oct.27,2004.

Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.

How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.

CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.

Review of Probability.

Advanced Statistical Computing Fall 2016

Bayesian data analysis

Markov chain Monte Carlo with people

Jun Liu Department of Statistics Stanford University

Course: Autonomous Machine Learning

Analyzing cultural evolution by iterated learning

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

Markov Networks.

Revealing priors on category structures through iterated learning

Ch13 Empirical Methods.

Expectation-Maximization & Belief Propagation

The loss function, the normal equation,

Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics

Markov Networks.

Presentation transcript:

Analyzing iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana

Cultural transmission Most knowledge is based on secondhand data Some things can only be learned from others –cultural objects transmitted across generations Studying the cognitive aspects of cultural transmission provides unique insights…

Iterated learning (Kirby, 2001) Each learner sees data, forms a hypothesis, produces the data given to the next learner c.f. the playground game “telephone”

Objects of iterated learning It’s not just about languages… In the wild: –religious concepts –social norms –myths and legends –causal theories In the lab: –functions and categories

Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions

Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions

Discrete generations of single learners P L (h|d): probability of inferring hypothesis h from data d P P (d|h): probability of generating data d from hypothesis h PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d)

Variables x (t+1) independent of history given x (t) Converges to a stationary distribution under easily checked conditions for ergodicity xx x xx x x x Transition matrix T = P(x (t+1) |x (t) ) Markov chains

Stationary distributions Stationary distribution: In matrix form  is the first eigenvector of the matrix T Second eigenvalue sets rate of convergence

Analyzing iterated learning d0d0 h1h1 d1d1 h2h2 PL(h|d)PL(h|d) PP(d|h)PP(d|h) PL(h|d)PL(h|d) d2d2 h3h3 PP(d|h)PP(d|h) PL(h|d)PL(h|d)  d P P (d|h)P L (h|d) h1h1 h2h2 h3h3 A Markov chain on hypotheses d0d0 d1d1  h P L (h|d) P P (d|h) d2d2 A Markov chain on data P L (h|d) P P (d|h) h 1,d 1 h 2,d 2 h 3,d 3 A Markov chain on hypothesis-data pairs

A Markov chain on hypotheses Transition probabilities sum out data Stationary distribution and convergence rate from eigenvectors and eigenvalues of Q –can be computed numerically for matrices of reasonable size, and analytically in some cases

Infinite populations in continuous time “Language dynamical equation” “Neutral model” (f j (x) constant) Stable equilibrium at first eigenvector of Q (Nowak, Komarova, & Niyogi, 2001) (Komarova & Nowak, 2003)

Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions

Bayesian inference Reverend Thomas Bayes Rational procedure for updating beliefs Foundation of many learning algorithms (e.g., Mackay, 2003) Widely used for language learning (e.g., Charniak, 1993)

Bayes’ theorem Posterior probability LikelihoodPrior probability Sum over space of hypotheses h: hypothesis d: data

Iterated Bayesian learning Learners are Bayesian agents

Markov chains on h and d Markov chain on h has stationary distribution Markov chain on d has stationary distribution the prior predictive distribution

Markov chain Monte Carlo A strategy for sampling from complex probability distributions Key idea: construct a Markov chain which converges to a particular distribution –e.g. Metropolis algorithm –e.g. Gibbs sampling

Gibbs sampling For variables x = x 1, x 2, …, x n Draw x i (t+1) from P(x i |x -i ) x -i = x 1 (t+1), x 2 (t+1),…, x i-1 (t+1), x i+1 (t), …, x n (t) Converges to P(x 1, x 2, …, x n ) (a.k.a. the heat bath algorithm in statistical physics) (Geman & Geman, 1984)

Gibbs sampling (MacKay, 2003)

Iterated learning is a Gibbs sampler Iterated Bayesian learning is a sampler for Implies: –(h,d) converges to this distribution –converence rates are known (Liu, Wong, & Kong, 1995)

Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions

An example: Gaussians If we assume… –data, d, is a single real number, x –hypotheses, h, are means of a Gaussian,  –prior, p(  ), is Gaussian(  0,  0 2 ) …then p(x n+1 |x n ) is Gaussian(  n,  x 2 +  n 2 )

An example: Gaussians If we assume… –data, d, is a single real number, x –hypotheses, h, are means of a Gaussian,  –prior, p(  ), is Gaussian(  0,  0 2 ) …then p(x n+1 |x n ) is Gaussian(  n,  x 2 +  n 2 ) p(x n |x 0 ) is Gaussian(  0 +c n x 0, (  x 2 +  0 2 )(1 - c 2n )) i.e. geometric convergence to prior

An example: Gaussians p(x n+1 |x 0 ) is Gaussian(  0 +c n x 0,(  x 2 +  0 2 )(1-c 2n ))

 0 = 0,  0 2 = 1, x 0 = 20 Iterated learning results in rapid convergence to prior

An example: Linear regression Assume –data, d, are pairs of real numbers (x, y) –hypotheses, h, are functions An example: linear regression –hypotheses have slope  and pass through origin –p(  ) is Gaussian(  0,  0 2 ) } x = 1  y

}  y  0 = 1,  0 2 = 0.1, y 0 = -1

An example: compositionality eventsutterances language xy function “actions” “agents” “nouns” “verbs” compositional

An example: compositionality Data: m event-utterance pairs Hypotheses: languages, with error  compositional holistic P(h)P(h)

Analysis technique 1.Compute transition matrix on languages 2.Sample Markov chains 3.Compare language frequencies with prior (can also compute eigenvalues etc.)

Convergence to priors  = 0.50,  = 0.05, m = 3  = 0.01,  = 0.05, m = 3 ChainPrior Iteration Effect of Prior

The information bottleneck  = 0.50,  = 0.05, m = 1  = 0.01,  = 0.05, m = 3 ChainPrior Iteration  = 0.50,  = 0.05, m = 10 No effect of bottleneck

The information bottleneck Bottleneck affects relative stability of languages favored by prior

Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions

A method for discovering priors Iterated learning converges to the prior… …evaluate prior by producing iterated learning

Iterated function learning Each learner sees a set of (x,y) pairs Makes predictions of y for new x values Predictions are data for the next learner datahypotheses

Function learning in the lab Stimulus Response Slider Feedback Examine iterated learning with different initial data

Iteration Initial data (Kalish, 2004)

Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions

Conclusions and open questions Iterated Bayesian learning converges to prior –properties of languages are properties of learners –information bottleneck doesn’t affect equilibrium What about other learning algorithms? What determines rates of convergence? –amount and structure of input data What happens with people?