Exploring cultural transmission by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana With thanks to: Anu Asnaani, Brian.

Slides:

Advertisements

Similar presentations

Sections 2 and 3 Chapter 1. Review of the Scientific Method The scientific method is not a list of rules that must be followed but a general guideline.

Advertisements

The influence of domain priors on intervention strategy Neil Bramley.

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

An RG theory of cultural evolution Gábor Fáth Hungarian Academy of Sciences Budapest, Hungary in collaboration with Miklos Sarvary - INSEAD, Fontainebleau,

Gibbs Sampling Qianji Zheng Oct. 5th, 2010.

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

Probabilistic Models of Cognition Conceptual Foundations Chater, Tenenbaum, & Yuille TICS, 10(7), (2006)

Markov Chains 1.

Causes and coincidences Tom Griffiths Cognitive and Linguistic Sciences Brown University.

Chapter 5 Probability Models Introduction –Modeling situations that involve an element of chance –Either independent or state variables is probability.

CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov

Part IV: Monte Carlo and nonparametric Bayes. Outline Monte Carlo methods Nonparametric Bayesian models.

The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman.

M.I. Jaime Alfonso Reyes ´Cortés.  The basic task for any probabilistic inference system is to compute the posterior probability distribution for a set.

… Hidden Markov Models Markov assumption: Transition model:

Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.

1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.

. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:

Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.

A Bayesian view of language evolution by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.

Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.

Tom Griffiths CogSci C131/Psych C123 Computational Models of Cognition.

Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.

Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky.

Markov chain Monte Carlo with people Tom Griffiths Department of Psychology Cognitive Science Program UC Berkeley with Mike Kalish, Stephan Lewandowsky,

Part II: How to make a Bayesian model. Questions you can answer… What would an ideal learner or observer infer from these data? What are the effects of.

Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-

Analyzing iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Normative models of human inductive inference Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.

Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Approximate Inference 2: Monte Carlo Markov Chain

机器学习陈昱北京大学计算机科学技术研究所信息安全工程研究中心. Concept Learning Reference : Ch2 in Mitchell’s book 1. Concepts: Inductive learning hypothesis General-to-specific.

Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.

Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.

Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?

Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future Optimality and Bayesian inference Results.

Role of Statistics in Geography

The Scientific Method in Psychology.  Descriptive Studies: naturalistic observations; case studies. Individuals observed in their environment.  Correlational.

1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.

Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.

Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.

SCIENCE The aim of this tutorial is to help you learn to identify and evaluate scientific methods and assumptions.

Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.

Lecture 2: Statistical learning primer for biologists

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.

Research Word has a broad spectrum of meanings –“Research this topic on ….” –“Years of research has produced a new ….”

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Bayesian statistics named after the Reverend Mr Bayes based on the concept that you can estimate the statistical properties of a system after measuting.

1 Guess the Covered Word Goal 1 EOC Review 2 Scientific Method A process that guides the search for answers to a question.

Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

Basic Bayes: model fitting, model selection, model averaging Josh Tenenbaum MIT.

Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.

Matching ® ® ® Global Map Local Map … … … obstacle Where am I on the global map?                                   

Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.

How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.

Biointelligence Laboratory, Seoul National University

Review of Probability.

Markov chain Monte Carlo with people

“Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child's?” Alan Turing, 1950.

Course: Autonomous Machine Learning

Analyzing cultural evolution by iterated learning

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

Markov Networks.

Rutgers-Camden Rutgers-Camden

Revealing priors on category structures through iterated learning

Presentation transcript:

Exploring cultural transmission by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana With thanks to: Anu Asnaani, Brian Christian, and Alana Firl

Cultural transmission Most knowledge is based on secondhand data Some things can only be learned from others –cultural knowledge transmitted across generations What are the consequences of learners learning from other learners?

Iterated learning (Kirby, 2001) Each learner sees data, forms a hypothesis, produces the data given to the next learner

Objects of iterated learning Knowledge communicated through data Examples: –religious concepts –social norms –myths and legends –causal theories –language

Analyzing iterated learning P L (h|d): probability of inferring hypothesis h from data d P P (d|h): probability of generating data d from hypothesis h PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d)

Analyzing iterated learning What are the consequences of iterated learning? Simulations Analytic results Complex algorithms Simple algorithms Komarova, Niyogi, & Nowak (2002) Brighton (2002) Kirby (2001) Smith, Kirby, & Brighton (2003) ?

Bayesian inference Reverend Thomas Bayes Rational procedure for updating beliefs Foundation of many learning algorithms Widely used for language learning

Bayes’ theorem Posterior probability LikelihoodPrior probability Sum over space of hypotheses h: hypothesis d: data

Iterated Bayesian learning Learners are Bayesian agents PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d)

Variables x (t+1) independent of history given x (t) Converges to a stationary distribution under easily checked conditions for ergodicity xx x xx x x x Transition matrix T = P(x (t+1) |x (t) ) Markov chains

Stationary distributions Stationary distribution: In matrix form  is the first eigenvector of the matrix T Second eigenvalue sets rate of convergence

Analyzing iterated learning d0d0 h1h1 d1d1 h2h2 PL(h|d)PL(h|d) PP(d|h)PP(d|h) PL(h|d)PL(h|d) d2d2 h3h3 PP(d|h)PP(d|h) PL(h|d)PL(h|d)  d P P (d|h)P L (h|d) h1h1 h2h2 h3h3 A Markov chain on hypotheses d0d0 d1d1  h P L (h|d) P P (d|h) d2d2 A Markov chain on data P L (h|d) P P (d|h) h 1,d 1 h 2,d 2 h 3,d 3 A Markov chain on hypothesis-data pairs

Stationary distributions Markov chain on h converges to the prior, p(h) Markov chain on d converges to the “prior predictive distribution” Markov chain on (h,d) is a Gibbs sampler for

Implications The probability that the nth learner entertains the hypothesis h approaches p(h) as n   Convergence to the prior occurs regardless of: –the properties of the hypotheses themselves –the amount or structure of the data transmitted The consequences of iterated learning are determined entirely by the biases of the learners

Identifying inductive biases Many problems in cognitive science can be formulated as problems of induction –learning languages, concepts, and causal relations Such problems are not solvable without bias (e.g., Goodman, 1955; Kearns & Vazirani, 1994; Vapnik, 1995) What biases guide human inductive inferences? If iterated learning converges to the prior, then it may provide a method for investigating biases

Serial reproduction (Bartlett, 1932) Participants see stimuli, then reproduce them from memory Reproductions of one participant are stimuli for the next Stimuli were interesting, rather than controlled –e.g., “War of the Ghosts”

Iterated function learning (heavy lifting by Mike Kalish) Each learner sees a set of (x,y) pairs Makes predictions of y for new x values Predictions are data for the next learner datahypotheses

Function learning experiments Stimulus Response Slider Feedback Examine iterated learning with different initial data

Iteration Initial data

Iterated concept learning (heavy lifting by Brian Christian) Each learner sees examples from a species Identifies species of four amoebae Iterated learning is run within-subjects data hypotheses

Two positive examples data (d) hypotheses (h)

Bayesian model (Tenenbaum, 1999; Tenenbaum & Griffiths, 2001) d: 2 amoebae h: set of 4 amoebae m: # of amoebae in the set d (= 2) |h|: # of amoebae in the set h (= 4) Posterior is renormalized prior What is the prior?

Classes of concepts (Shepard, Hovland, & Jenkins, 1958) Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 shape size color

Experiment design (for each subject) Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 6 iterated learning chains 6 independent learning “chains”

Estimating the prior data (d) hypotheses (h)

Estimating the prior Class 1 Class 2 Class 3 Class 4 Class 5 Class Prior r = Bayesian model Human subjects

Two positive examples (n = 20) Probability Iteration Probability Iteration Human learners Bayesian model

Two positive examples (n = 20) Probability Bayesian model Human learners

Three positive examples data (d) hypotheses (h)

Three positive examples (n = 20) Probability Iteration Probability Iteration Human learners Bayesian model

Three positive examples (n = 20) Bayesian model Human learners

Conclusions Consequences of iterated learning with Bayesian learners determined by the biases of the learners Consistent results are obtained with human learners Provides an explanation for cultural universals… –universal properties are probable under the prior –a direct connection between mind and culture …and a novel method for evaluating the inductive biases that guide human learning

Discovering the biases of models Generic neural network:

Discovering the biases of models EXAM (Delosh, Busemeyer, & McDaniel, 1997):

Discovering the biases of models POLE (Kalish, Lewandowsky, & Kruschke, 2004):