The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Bayesian Estimation in MARK
Dynamic Bayesian Networks (DBNs)
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Markov Chains 1.
Causes and coincidences Tom Griffiths Cognitive and Linguistic Sciences Brown University.
. Computational Genomics Lecture 7c Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
Lecture 3: Markov processes, master equation
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Bayesian statistics – MCMC techniques
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
Part IV: Monte Carlo and nonparametric Bayes. Outline Monte Carlo methods Nonparametric Bayesian models.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
A Bayesian view of language evolution by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Tom Griffiths CogSci C131/Psych C123 Computational Models of Cognition.
Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky.
Markov chain Monte Carlo with people Tom Griffiths Department of Psychology Cognitive Science Program UC Berkeley with Mike Kalish, Stephan Lewandowsky,
Exploring cultural transmission by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana With thanks to: Anu Asnaani, Brian.
Part II: How to make a Bayesian model. Questions you can answer… What would an ideal learner or observer infer from these data? What are the effects of.
Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-
Analyzing iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Normative models of human inductive inference Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.
CS6800 Advanced Theory of Computation Fall 2012 Vinay B Gavirangaswamy
Approximate Inference 2: Monte Carlo Markov Chain
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future Optimality and Bayesian inference Results.
Naive Bayes Classifier
Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.
Introduction to Research
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
Lecture 2: Statistical learning primer for biologists
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Sampling and estimation Petter Mostad
A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Section 1. Qualitative Research: Theory and Practice  Methods chosen for research dependant on a number of factors including:  Purpose of the research.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
CSC2535 Lecture 5 Sigmoid Belief Nets
The Computational Nature of Language Learning and Evolution 10. Variations and Case Studies Summarized by In-Hee Lee
Basic Bayes: model fitting, model selection, model averaging Josh Tenenbaum MIT.
Ch 5. Language Change: A Preliminary Model 5.1 ~ 5.2 The Computational Nature of Language Learning and Evolution P. Niyogi 2006 Summarized by Kwonill,
Biointelligence Laboratory, Seoul National University
Review of Probability.
Advanced Statistical Computing Fall 2016
Markov chain Monte Carlo with people
Data Mining Lecture 11.
Analyzing cultural evolution by iterated learning
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Revealing priors on category structures through iterated learning
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Discrete-time markov chain (continuation)
Presentation transcript:

The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Learning data

Iterated learning (Kirby, 2001) What are the consequences of learners learning from other learners?

Objects of iterated learning Knowledge communicated across generations through provision of data by learners Examples: –religious concepts –social norms –myths and legends –causal theories –language

Language The languages spoken by humans are typically viewed as the result of two factors –individual learning –innate constraints (biological evolution) This limits the possible explanations for different kinds of linguistic phenomena

Linguistic universals Human languages possess universal properties –e.g. compositionality (Comrie, 1981; Greenberg, 1963; Hawkins, 1988) Traditional explanation: –linguistic universals reflect innate constraints specific to a system for acquiring language (e.g., Chomsky, 1965)

Cultural evolution Languages are also subject to change via cultural evolution (through iterated learning) Alternative explanation: –linguistic universals emerge as the result of the fact that language is learned anew by each generation (using general-purpose learning mechanisms) (e.g., Briscoe, 1998; Kirby, 2001)

Analyzing iterated learning P L (h|d): probability of inferring hypothesis h from data d P P (d|h): probability of generating data d from hypothesis h PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d)

Variables x (t+1) independent of history given x (t) Converges to a stationary distribution under easily checked conditions (i.e., if it is ergodic) xx x xx x x x Transition matrix T = P(x (t+1) |x (t) ) Markov chains

Stationary distributions Stationary distribution: In matrix form  is the first eigenvector of the matrix T –easy to compute for finite Markov chains –can sometimes be found analytically

Analyzing iterated learning d0d0 h1h1 d1d1 h2h2 PL(h|d)PL(h|d) PP(d|h)PP(d|h) PL(h|d)PL(h|d) d2d2 h3h3 PP(d|h)PP(d|h) PL(h|d)PL(h|d)  d P P (d|h)P L (h|d) h1h1 h2h2 h3h3 A Markov chain on hypotheses d0d0 d1d1  h P L (h|d) P P (d|h) d2d2 A Markov chain on data

Bayesian inference Reverend Thomas Bayes Rational procedure for updating beliefs Foundation of many learning algorithms Lets us make the inductive biases of learners precise

Bayes’ theorem Posterior probability LikelihoodPrior probability Sum over space of hypotheses h: hypothesis d: data

Iterated Bayesian learning PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d) Assume learners sample from their posterior distribution:

Stationary distributions Markov chain on h converges to the prior, P(h) Markov chain on d converges to the “prior predictive distribution” (Griffiths & Kalish, 2005)

Explaining convergence to the prior PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d) Intuitively: data acts once, prior many times Formally: iterated learning with Bayesian agents is a Gibbs sampler for P(d,h) (Griffiths & Kalish, submitted)

Gibbs sampling For variables x = x 1, x 2, …, x n Draw x i (t+1) from P(x i |x -i ) x -i = x 1 (t+1), x 2 (t+1),…, x i-1 (t+1), x i+1 (t), …, x n (t) Converges to P(x 1, x 2, …, x n ) (a.k.a. the heat bath algorithm in statistical physics) (Geman & Geman, 1984)

Gibbs sampling (MacKay, 2003)

Explaining convergence to the prior PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d) When the target distribution is P(d,h), the conditional distributions are P L (h|d) and P P (d|h)

Implications for linguistic universals When learners sample from P(h|d), the distribution over languages converges to the prior –identifies a one-to-one correspondence between inductive biases and linguistic universals

Iterated Bayesian learning PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d) Assume learners sample from their posterior distribution:

r = 1 r = 2 r =  From sampling to maximizing

General analytic results are hard to obtain –(r =  is a stochastic version of the EM algorithm) For certain classes of languages, it is possible to show that the stationary distribution gives each hypothesis h probability proportional to P(h) r –the ordering identified by the prior is preserved, but not the corresponding probabilities (Kirby, Dowman, & Griffiths, submitted) From sampling to maximizing

Implications for linguistic universals When learners sample from P(h|d), the distribution over languages converges to the prior –identifies a one-to-one correspondence between inductive biases and linguistic universals As learners move towards maximizing, the influence of the prior is exaggerated –weak biases can produce strong universals –cultural evolution is a viable alternative to traditional explanations for linguistic universals

Analyzing iterated learning The outcome of iterated learning is determined by the inductive biases of the learners –hypotheses with high prior probability ultimately appear with high probability in the population Clarifies the connection between constraints on language learning and linguistic universals… …and provides formal justification for the idea that culture reflects the structure of the mind

Conclusions Iterated learning provides a lens for magnifying the inductive biases of learners –small effects for individuals are big effects for groups When cognition affects culture, studying groups can give us better insight into individuals data