Download presentation
Presentation is loading. Please wait.
1
Analyzing iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana
2
Cultural transmission Most knowledge is based on secondhand data Some things can only be learned from others –cultural objects transmitted across generations Studying the cognitive aspects of cultural transmission provides unique insights…
3
Iterated learning (Kirby, 2001) Each learner sees data, forms a hypothesis, produces the data given to the next learner c.f. the playground game “telephone”
4
Objects of iterated learning It’s not just about languages… In the wild: –religious concepts –social norms –myths and legends –causal theories In the lab: –functions and categories
5
Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions
6
Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions
7
Discrete generations of single learners P L (h|d): probability of inferring hypothesis h from data d P P (d|h): probability of generating data d from hypothesis h PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d)
8
Variables x (t+1) independent of history given x (t) Converges to a stationary distribution under easily checked conditions for ergodicity xx x xx x x x Transition matrix T = P(x (t+1) |x (t) ) Markov chains
9
Stationary distributions Stationary distribution: In matrix form is the first eigenvector of the matrix T Second eigenvalue sets rate of convergence
10
Analyzing iterated learning d0d0 h1h1 d1d1 h2h2 PL(h|d)PL(h|d) PP(d|h)PP(d|h) PL(h|d)PL(h|d) d2d2 h3h3 PP(d|h)PP(d|h) PL(h|d)PL(h|d) d P P (d|h)P L (h|d) h1h1 h2h2 h3h3 A Markov chain on hypotheses d0d0 d1d1 h P L (h|d) P P (d|h) d2d2 A Markov chain on data P L (h|d) P P (d|h) h 1,d 1 h 2,d 2 h 3,d 3 A Markov chain on hypothesis-data pairs
11
A Markov chain on hypotheses Transition probabilities sum out data Stationary distribution and convergence rate from eigenvectors and eigenvalues of Q –can be computed numerically for matrices of reasonable size, and analytically in some cases
12
Infinite populations in continuous time “Language dynamical equation” “Neutral model” (f j (x) constant) Stable equilibrium at first eigenvector of Q (Nowak, Komarova, & Niyogi, 2001) (Komarova & Nowak, 2003)
13
Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions
14
Bayesian inference Reverend Thomas Bayes Rational procedure for updating beliefs Foundation of many learning algorithms (e.g., Mackay, 2003) Widely used for language learning (e.g., Charniak, 1993)
15
Bayes’ theorem Posterior probability LikelihoodPrior probability Sum over space of hypotheses h: hypothesis d: data
16
Iterated Bayesian learning Learners are Bayesian agents
17
Markov chains on h and d Markov chain on h has stationary distribution Markov chain on d has stationary distribution the prior predictive distribution
18
Markov chain Monte Carlo A strategy for sampling from complex probability distributions Key idea: construct a Markov chain which converges to a particular distribution –e.g. Metropolis algorithm –e.g. Gibbs sampling
19
Gibbs sampling For variables x = x 1, x 2, …, x n Draw x i (t+1) from P(x i |x -i ) x -i = x 1 (t+1), x 2 (t+1),…, x i-1 (t+1), x i+1 (t), …, x n (t) Converges to P(x 1, x 2, …, x n ) (a.k.a. the heat bath algorithm in statistical physics) (Geman & Geman, 1984)
20
Gibbs sampling (MacKay, 2003)
21
Iterated learning is a Gibbs sampler Iterated Bayesian learning is a sampler for Implies: –(h,d) converges to this distribution –converence rates are known (Liu, Wong, & Kong, 1995)
22
Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions
23
An example: Gaussians If we assume… –data, d, is a single real number, x –hypotheses, h, are means of a Gaussian, –prior, p( ), is Gaussian( 0, 0 2 ) …then p(x n+1 |x n ) is Gaussian( n, x 2 + n 2 )
24
An example: Gaussians If we assume… –data, d, is a single real number, x –hypotheses, h, are means of a Gaussian, –prior, p( ), is Gaussian( 0, 0 2 ) …then p(x n+1 |x n ) is Gaussian( n, x 2 + n 2 ) p(x n |x 0 ) is Gaussian( 0 +c n x 0, ( x 2 + 0 2 )(1 - c 2n )) i.e. geometric convergence to prior
25
An example: Gaussians p(x n+1 |x 0 ) is Gaussian( 0 +c n x 0,( x 2 + 0 2 )(1-c 2n ))
26
0 = 0, 0 2 = 1, x 0 = 20 Iterated learning results in rapid convergence to prior
27
An example: Linear regression Assume –data, d, are pairs of real numbers (x, y) –hypotheses, h, are functions An example: linear regression –hypotheses have slope and pass through origin –p( ) is Gaussian( 0, 0 2 ) } x = 1 y
28
} y 0 = 1, 0 2 = 0.1, y 0 = -1
29
An example: compositionality 0 1 01 0 1 01 eventsutterances language xy function “actions” “agents” “nouns” “verbs” compositional
30
An example: compositionality Data: m event-utterance pairs Hypotheses: languages, with error 0 1 01 0 1 01 compositional 0 1 01 0 1 01 holistic P(h)P(h)
31
Analysis technique 1.Compute transition matrix on languages 2.Sample Markov chains 3.Compare language frequencies with prior (can also compute eigenvalues etc.)
32
Convergence to priors = 0.50, = 0.05, m = 3 = 0.01, = 0.05, m = 3 ChainPrior Iteration Effect of Prior
33
The information bottleneck = 0.50, = 0.05, m = 1 = 0.01, = 0.05, m = 3 ChainPrior Iteration = 0.50, = 0.05, m = 10 No effect of bottleneck
34
The information bottleneck Bottleneck affects relative stability of languages favored by prior
35
Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions
36
A method for discovering priors Iterated learning converges to the prior… …evaluate prior by producing iterated learning
37
Iterated function learning Each learner sees a set of (x,y) pairs Makes predictions of y for new x values Predictions are data for the next learner datahypotheses
38
Function learning in the lab Stimulus Response Slider Feedback Examine iterated learning with different initial data
39
1 2 3 4 5 6 7 8 9 Iteration Initial data (Kalish, 2004)
40
Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions
41
Conclusions and open questions Iterated Bayesian learning converges to prior –properties of languages are properties of learners –information bottleneck doesn’t affect equilibrium What about other learning algorithms? What determines rates of convergence? –amount and structure of input data What happens with people?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.