Exploring cultural transmission by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana With thanks to: Anu Asnaani, Brian Christian, and Alana Firl
Cultural transmission Most knowledge is based on secondhand data Some things can only be learned from others –cultural knowledge transmitted across generations What are the consequences of learners learning from other learners?
Iterated learning (Kirby, 2001) Each learner sees data, forms a hypothesis, produces the data given to the next learner
Objects of iterated learning Knowledge communicated through data Examples: –religious concepts –social norms –myths and legends –causal theories –language
Analyzing iterated learning P L (h|d): probability of inferring hypothesis h from data d P P (d|h): probability of generating data d from hypothesis h PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d)
Analyzing iterated learning What are the consequences of iterated learning? Simulations Analytic results Complex algorithms Simple algorithms Komarova, Niyogi, & Nowak (2002) Brighton (2002) Kirby (2001) Smith, Kirby, & Brighton (2003) ?
Bayesian inference Reverend Thomas Bayes Rational procedure for updating beliefs Foundation of many learning algorithms Widely used for language learning
Bayes’ theorem Posterior probability LikelihoodPrior probability Sum over space of hypotheses h: hypothesis d: data
Iterated Bayesian learning Learners are Bayesian agents PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d)
Variables x (t+1) independent of history given x (t) Converges to a stationary distribution under easily checked conditions for ergodicity xx x xx x x x Transition matrix T = P(x (t+1) |x (t) ) Markov chains
Stationary distributions Stationary distribution: In matrix form is the first eigenvector of the matrix T Second eigenvalue sets rate of convergence
Analyzing iterated learning d0d0 h1h1 d1d1 h2h2 PL(h|d)PL(h|d) PP(d|h)PP(d|h) PL(h|d)PL(h|d) d2d2 h3h3 PP(d|h)PP(d|h) PL(h|d)PL(h|d) d P P (d|h)P L (h|d) h1h1 h2h2 h3h3 A Markov chain on hypotheses d0d0 d1d1 h P L (h|d) P P (d|h) d2d2 A Markov chain on data P L (h|d) P P (d|h) h 1,d 1 h 2,d 2 h 3,d 3 A Markov chain on hypothesis-data pairs
Stationary distributions Markov chain on h converges to the prior, p(h) Markov chain on d converges to the “prior predictive distribution” Markov chain on (h,d) is a Gibbs sampler for
Implications The probability that the nth learner entertains the hypothesis h approaches p(h) as n Convergence to the prior occurs regardless of: –the properties of the hypotheses themselves –the amount or structure of the data transmitted The consequences of iterated learning are determined entirely by the biases of the learners
Identifying inductive biases Many problems in cognitive science can be formulated as problems of induction –learning languages, concepts, and causal relations Such problems are not solvable without bias (e.g., Goodman, 1955; Kearns & Vazirani, 1994; Vapnik, 1995) What biases guide human inductive inferences? If iterated learning converges to the prior, then it may provide a method for investigating biases
Serial reproduction (Bartlett, 1932) Participants see stimuli, then reproduce them from memory Reproductions of one participant are stimuli for the next Stimuli were interesting, rather than controlled –e.g., “War of the Ghosts”
Iterated function learning (heavy lifting by Mike Kalish) Each learner sees a set of (x,y) pairs Makes predictions of y for new x values Predictions are data for the next learner datahypotheses
Function learning experiments Stimulus Response Slider Feedback Examine iterated learning with different initial data
Iteration Initial data
Iterated concept learning (heavy lifting by Brian Christian) Each learner sees examples from a species Identifies species of four amoebae Iterated learning is run within-subjects data hypotheses
Two positive examples data (d) hypotheses (h)
Bayesian model (Tenenbaum, 1999; Tenenbaum & Griffiths, 2001) d: 2 amoebae h: set of 4 amoebae m: # of amoebae in the set d (= 2) |h|: # of amoebae in the set h (= 4) Posterior is renormalized prior What is the prior?
Classes of concepts (Shepard, Hovland, & Jenkins, 1958) Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 shape size color
Experiment design (for each subject) Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 6 iterated learning chains 6 independent learning “chains”
Estimating the prior data (d) hypotheses (h)
Estimating the prior Class 1 Class 2 Class 3 Class 4 Class 5 Class Prior r = Bayesian model Human subjects
Two positive examples (n = 20) Probability Iteration Probability Iteration Human learners Bayesian model
Two positive examples (n = 20) Probability Bayesian model Human learners
Three positive examples data (d) hypotheses (h)
Three positive examples (n = 20) Probability Iteration Probability Iteration Human learners Bayesian model
Three positive examples (n = 20) Bayesian model Human learners
Conclusions Consequences of iterated learning with Bayesian learners determined by the biases of the learners Consistent results are obtained with human learners Provides an explanation for cultural universals… –universal properties are probable under the prior –a direct connection between mind and culture …and a novel method for evaluating the inductive biases that guide human learning
Discovering the biases of models Generic neural network:
Discovering the biases of models EXAM (Delosh, Busemeyer, & McDaniel, 1997):
Discovering the biases of models POLE (Kalish, Lewandowsky, & Kruschke, 2004):