Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyzing iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.

Similar presentations


Presentation on theme: "Analyzing iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana."— Presentation transcript:

1 Analyzing iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana

2 Cultural transmission Most knowledge is based on secondhand data Some things can only be learned from others –cultural objects transmitted across generations Studying the cognitive aspects of cultural transmission provides unique insights…

3 Iterated learning (Kirby, 2001) Each learner sees data, forms a hypothesis, produces the data given to the next learner c.f. the playground game “telephone”

4 Objects of iterated learning It’s not just about languages… In the wild: –religious concepts –social norms –myths and legends –causal theories In the lab: –functions and categories

5 Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions

6 Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions

7 Discrete generations of single learners P L (h|d): probability of inferring hypothesis h from data d P P (d|h): probability of generating data d from hypothesis h PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d)

8 Variables x (t+1) independent of history given x (t) Converges to a stationary distribution under easily checked conditions for ergodicity xx x xx x x x Transition matrix T = P(x (t+1) |x (t) ) Markov chains

9 Stationary distributions Stationary distribution: In matrix form  is the first eigenvector of the matrix T Second eigenvalue sets rate of convergence

10 Analyzing iterated learning d0d0 h1h1 d1d1 h2h2 PL(h|d)PL(h|d) PP(d|h)PP(d|h) PL(h|d)PL(h|d) d2d2 h3h3 PP(d|h)PP(d|h) PL(h|d)PL(h|d)  d P P (d|h)P L (h|d) h1h1 h2h2 h3h3 A Markov chain on hypotheses d0d0 d1d1  h P L (h|d) P P (d|h) d2d2 A Markov chain on data P L (h|d) P P (d|h) h 1,d 1 h 2,d 2 h 3,d 3 A Markov chain on hypothesis-data pairs

11 A Markov chain on hypotheses Transition probabilities sum out data Stationary distribution and convergence rate from eigenvectors and eigenvalues of Q –can be computed numerically for matrices of reasonable size, and analytically in some cases

12 Infinite populations in continuous time “Language dynamical equation” “Neutral model” (f j (x) constant) Stable equilibrium at first eigenvector of Q (Nowak, Komarova, & Niyogi, 2001) (Komarova & Nowak, 2003)

13 Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions

14 Bayesian inference Reverend Thomas Bayes Rational procedure for updating beliefs Foundation of many learning algorithms (e.g., Mackay, 2003) Widely used for language learning (e.g., Charniak, 1993)

15 Bayes’ theorem Posterior probability LikelihoodPrior probability Sum over space of hypotheses h: hypothesis d: data

16 Iterated Bayesian learning Learners are Bayesian agents

17 Markov chains on h and d Markov chain on h has stationary distribution Markov chain on d has stationary distribution the prior predictive distribution

18 Markov chain Monte Carlo A strategy for sampling from complex probability distributions Key idea: construct a Markov chain which converges to a particular distribution –e.g. Metropolis algorithm –e.g. Gibbs sampling

19 Gibbs sampling For variables x = x 1, x 2, …, x n Draw x i (t+1) from P(x i |x -i ) x -i = x 1 (t+1), x 2 (t+1),…, x i-1 (t+1), x i+1 (t), …, x n (t) Converges to P(x 1, x 2, …, x n ) (a.k.a. the heat bath algorithm in statistical physics) (Geman & Geman, 1984)

20 Gibbs sampling (MacKay, 2003)

21 Iterated learning is a Gibbs sampler Iterated Bayesian learning is a sampler for Implies: –(h,d) converges to this distribution –converence rates are known (Liu, Wong, & Kong, 1995)

22 Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions

23 An example: Gaussians If we assume… –data, d, is a single real number, x –hypotheses, h, are means of a Gaussian,  –prior, p(  ), is Gaussian(  0,  0 2 ) …then p(x n+1 |x n ) is Gaussian(  n,  x 2 +  n 2 )

24 An example: Gaussians If we assume… –data, d, is a single real number, x –hypotheses, h, are means of a Gaussian,  –prior, p(  ), is Gaussian(  0,  0 2 ) …then p(x n+1 |x n ) is Gaussian(  n,  x 2 +  n 2 ) p(x n |x 0 ) is Gaussian(  0 +c n x 0, (  x 2 +  0 2 )(1 - c 2n )) i.e. geometric convergence to prior

25 An example: Gaussians p(x n+1 |x 0 ) is Gaussian(  0 +c n x 0,(  x 2 +  0 2 )(1-c 2n ))

26  0 = 0,  0 2 = 1, x 0 = 20 Iterated learning results in rapid convergence to prior

27 An example: Linear regression Assume –data, d, are pairs of real numbers (x, y) –hypotheses, h, are functions An example: linear regression –hypotheses have slope  and pass through origin –p(  ) is Gaussian(  0,  0 2 ) } x = 1  y

28 }  y  0 = 1,  0 2 = 0.1, y 0 = -1

29 An example: compositionality 0 1 01 0 1 01 eventsutterances language xy function “actions” “agents” “nouns” “verbs” compositional

30 An example: compositionality Data: m event-utterance pairs Hypotheses: languages, with error  0 1 01 0 1 01 compositional 0 1 01 0 1 01 holistic P(h)P(h)

31 Analysis technique 1.Compute transition matrix on languages 2.Sample Markov chains 3.Compare language frequencies with prior (can also compute eigenvalues etc.)

32 Convergence to priors  = 0.50,  = 0.05, m = 3  = 0.01,  = 0.05, m = 3 ChainPrior Iteration Effect of Prior

33 The information bottleneck  = 0.50,  = 0.05, m = 1  = 0.01,  = 0.05, m = 3 ChainPrior Iteration  = 0.50,  = 0.05, m = 10 No effect of bottleneck

34 The information bottleneck Bottleneck affects relative stability of languages favored by prior

35 Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions

36 A method for discovering priors Iterated learning converges to the prior… …evaluate prior by producing iterated learning

37 Iterated function learning Each learner sees a set of (x,y) pairs Makes predictions of y for new x values Predictions are data for the next learner datahypotheses

38 Function learning in the lab Stimulus Response Slider Feedback Examine iterated learning with different initial data

39 1 2 3 4 5 6 7 8 9 Iteration Initial data (Kalish, 2004)

40 Outline 1.Analyzing iterated learning 2.Iterated Bayesian learning 3.Examples 4.Iterated learning with humans 5.Conclusions and open questions

41 Conclusions and open questions Iterated Bayesian learning converges to prior –properties of languages are properties of learners –information bottleneck doesn’t affect equilibrium What about other learning algorithms? What determines rates of convergence? –amount and structure of input data What happens with people?

42


Download ppt "Analyzing iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana."

Similar presentations


Ads by Google