Everyday inductive leaps Making predictions and detecting coincidences Tom Griffiths Department of Psychology Program in Cognitive Science University of.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Bayesian Estimation in MARK
Bayesian Inference for Signal Detection Models of Recognition Memory Michael Lee Department of Cognitive Sciences University California Irvine
Binomial Distribution & Bayes’ Theorem. Questions What is a probability? What is the probability of obtaining 2 heads in 4 coin tosses? What is the probability.
Probability - 1 Probability statements are about likelihood, NOT determinism Example: You can’t say there is a 100% chance of rain (no possibility of.
Causes and coincidences Tom Griffiths Cognitive and Linguistic Sciences Brown University.
Chapter 10: Hypothesis Testing
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Part IV: Monte Carlo and nonparametric Bayes. Outline Monte Carlo methods Nonparametric Bayesian models.
The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman.
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
A Bayesian view of language evolution by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.
Visual Recognition Tutorial
Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.
Machine Learning CMPT 726 Simon Fraser University
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.
Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky.
Statistics for the Social Sciences Psychology 340 Spring 2005 Hypothesis testing.
Markov chain Monte Carlo with people Tom Griffiths Department of Psychology Cognitive Science Program UC Berkeley with Mike Kalish, Stephan Lewandowsky,
Exploring cultural transmission by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana With thanks to: Anu Asnaani, Brian.
Analyzing iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.
Normative models of human inductive inference Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.
Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.
Chapter 6 Probability.
Statistics Continued. Purpose of Inferential Statistics Try to reach conclusions that extend beyond the immediate data Make judgments about whether an.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
1 Chapter 10: Section 10.1: Vocabulary of Hypothesis Testing.
Bayesian approaches to cognitive sciences. Word learning Bayesian property induction Theory-based causal inference.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
The problem of sampling error It is often the case—especially when making point predictions—that what we observe differs from what our theory predicts.
Lecture Discrete Probability. 5.1 Probabilities Important in study of complexity of algorithms. Modeling the uncertain world: information, data.
Significance Testing Statistical testing of the mean (z test)
The Argument for Using Statistics Weighing the Evidence Statistical Inference: An Overview Applying Statistical Inference: An Example Going Beyond Testing.
Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future Optimality and Bayesian inference Results.
Exam 1 Median: 74 Quartiles: 68, 84 Interquartile range: 16 Mean: 74.9 Standard deviation: 12.5 z = -1: 62.4 z = -1: 87.4 z = -1z = +1 Worst Question:
St5219: Bayesian hierarchical modelling lecture 2.1.
Chapter 21: More About Tests “The wise man proportions his belief to the evidence.” -David Hume 1748.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 10, Slide 1 Chapter 10 Understanding Randomness.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Populations III: evidence, uncertainty, and decisions Bio 415/615.
Welcome to MM570 Psychological Statistics
Statistics What is the probability that 7 heads will be observed in 10 tosses of a fair coin? This is a ________ problem. Have probabilities on a fundamental.
URBDP 591 I Lecture 4: Research Question Objectives How do we define a research question? What is a testable hypothesis? How do we test an hypothesis?
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Probability judgement. AO1 Probability judgement ‘Probability’ refers to the likelihood of an event occurring, such as the likelihood that a horse will.
Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.
Basic Bayes: model fitting, model selection, model averaging Josh Tenenbaum MIT.
FIXETH LIKELIHOODS this is correct. Bayesian methods I: theory.
AP Statistics From Randomness to Probability Chapter 14.
Introduction to Hypothesis Testing: The Binomial Test
Introduction to Hypothesis Testing: The Binomial Test
Review of Probability.
Bayesian data analysis
Markov chain Monte Carlo with people
Introduction to Hypothesis Testing: The Binomial Test
Revealing priors on category structures through iterated learning
Statistical NLP: Lecture 4
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Everyday inductive leaps Making predictions and detecting coincidences Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley (joint work with Josh Tenenbaum, MIT)

Inductive problems Inferring structure from data Perception –e.g. structure of 3D world from 2D visual data data hypotheses cube shaded hexagon

Inductive problems Inferring structure from data Perception –e.g. structure of 3D world from 2D visual data Cognition –e.g. whether a process is random hypotheses fair coin two heads data HHHHH

Perception is optimal

Cognition is not

Everyday inductive leaps Inferences we make effortlessly every day –making predictions –detecting coincidences –evaluating randomness –learning causal relationships –identifying categories –picking out regularities in language A chance to study induction in microcosm, and compare cognition to optimal solutions

Two everyday inductive leaps Predicting the future Detecting coincidences

Two everyday inductive leaps Predicting the future Detecting coincidences

Predicting the future How often is Google News updated? t = time since last update t total = time between updates What should we guess for t total given t ?

Reverend Thomas Bayes

Bayes’ theorem Posterior probability LikelihoodPrior probability Sum over space of hypotheses h: hypothesis d: data

Bayes’ theorem h: hypothesis d: data

Bayesian inference p(t total |t)  p(t|t total ) p(t total ) posterior probability likelihoodprior

Bayesian inference p(t total |t)  p(t|t total ) p(t total ) p(t total |t)  1/t total p(t total ) assume random sample (0 < t < t total ) posterior probability likelihoodprior

The effects of priors Different kinds of priors p(t total ) are appropriate in different domains e.g. wealthe.g. height

The effects of priors

Evaluating human predictions Different domains with different priors: –a movie has made $60 million [power-law] –your friend quotes from line 17 of a poem [power-law] –you meet a 78 year old man [Gaussian] –a movie has been running for 55 minutes [Gaussian] –a U.S. congressman has served 11 years [Erlang] Prior distributions derived from actual data Use 5 values of t for each People predict t total

people parametric prior empirical prior Gott’s rule

Probability matching p(t total |t past ) t total Quantile of Bayesian posterior distribution Proportion of judgments below predicted value

Probability matching Average over all prediction tasks: movie run times movie grosses poem lengths life spans terms in congress cake baking times p(t total |t past ) t total Quantile of Bayesian posterior distribution Proportion of judgments below predicted value

Predicting the future People produce accurate predictions for the duration and extent of everyday events Strong prior knowledge –form of the prior (power-law or exponential) –distribution given that form (parameters) Contrast with “base rate neglect” (Kahneman & Tversky, 1973)

Two everyday inductive leaps Predicting the future Detecting coincidences

November 12, 2001: New Jersey lottery results were 5-8-7, the same day that American Airlines flight 587 crashed

"It could be that, collectively, the people in New York caused those lottery numbers to come up 911," says Henry Reed. A psychologist who specializes in intuition, he teaches seminars at the Edgar Cayce Association for Research and Enlightenment in Virginia Beach, VA. "If enough people all are thinking the same thing, at the same time, they can cause events to happen," he says. "It's called psychokinesis."

The bombing of London (Gilovich, 1991)

The bombing of London (Gilovich, 1991)

(Snow, 1855) John Snow and cholera

76 years 75 years (Halley, 1752)

The paradox of coincidences How can coincidences simultaneously lead us to irrational conclusions and significant discoveries?

A common definition: Coincidences are unlikely events “an event which seems so unlikely that it is worth telling a story about” “we sense that it is too unlikely to have been the result of luck or mere chance”

Coincidences are not just unlikely... HHHHHHHHHH vs. HHTHTHTTHT

Priors: p(cause) p(chance) Data: d Hypotheses: causechance a novel causal relationship exists no such relationship exists Likelihoods: p(d|cause) p(d|chance) Bayesian causal induction

Likelihood ratio (evidence) Prior odds high low high low cause chance ? ?

Likelihood ratio (evidence) Prior odds high low high low cause chance coincidence ? Bayesian causal induction

What makes a coincidence? A coincidence is an event that provides evidence for causal structure, but not enough evidence to make us believe that structure exists

What makes a coincidence? likelihood ratio is high A coincidence is an event that provides evidence for causal structure, but not enough evidence to make us believe that structure exists

likelihood ratio is high prior odds are low posterior odds are middling A coincidence is an event that provides evidence for causal structure, but not enough evidence to make us believe that structure exists What makes a coincidence?

Coincidence and the supernatural our (physical) theory says no connection exists the data suggest a connection between events our physical theory is wrong, or there are forces outside it likelihood ratio is high prior odds are low posterior odds are middling

HHHHHHHHHH HHTHTHTTHT likelihood ratio is high prior odds are low posterior odds are middling

Bayesian causal induction Hypotheses: Likelihoods: Priors: Data: frequency of effect in presence of cause cause chance E C E C  1 -  0 < p(E) < 1 p(E) = 0.5

HHHHHHHHHH HHTHTHTTHT likelihood ratio is high prior odds are low posterior odds are middling likelihood ratio is low prior odds are low posterior odds are low coincidence chance

Empirical tests Is this definition correct? –from coincidence to evidence How do people assess complex coincidences? –the bombing of London –coincidences in date

Empirical tests Is this definition correct? –from coincidence to evidence How do people assess complex coincidences? –the bombing of London –coincidences in date

HHHHHHHHHHHHHHHHHHHHHH HHHHHHHHHH likelihood ratio is high prior odds are low posterior odds are middling coincidence likelihood ratio is very high prior odds are low posterior odds are high cause

From coincidence to evidence coincidenceevidence for a causal relation Transition produced by –increase in likelihood ratio (e.g., coin flipping) –increase in prior odds (e.g., genetics vs.ESP)

Testing the definition Provide participants with data from experiments Manipulate: –cover story: genetics vs. ESP (prior) –data: number of heads/males (likelihood) –task: “coincidence or evidence?” vs. “how likely?” Predictions: –coincidences affected by prior and likelihood –relationship between coincidence and posterior

r = Number of heads/males Proportion “coincidence” Posterior probability

Empirical tests Is this definition correct? –from coincidence to evidence How do people assess complex coincidences? –the bombing of London –coincidences in date

Complex coincidences Many coincidences involve structure hidden in a sea of noise (e.g., bombing of London) How well do people detect such structure? Strategy: examine correspondence between strength of coincidence and likelihood ratio

The bombing of London

(uniform) Spread Location Ratio Number Change in... People

Bayesian causal induction Hypotheses: Likelihoods: Priors:  1 -  uniform + regularity cause chance T X X X X T T T T X X X X T Data: bomb locations

r = 0.98 (uniform) Spread Location Ratio Number Change in... People Bayes

76 years 75 years

May 14, July 8, August 21, December 25 vs. August 3, August 3, August 3, August 3 Coincidences in date

People

Bayesian causal induction Hypotheses: Likelihoods: Priors:  1 -  uniform uniform + regularity August Data: birthdays of those present cause chance P P P P P P P P B B B B B B B B

People Bayes Regularities: Proximity in date Same day of month Same month

Coincidences Provide evidence for causal structure, but not enough to make us believe that structure exists Intimately related to causal induction –an opportunity to revise a theory –a window on the process of discovery Guided by a well calibrated sense of when an event provides evidence of causal structure

The paradox of coincidences false significant discovery true false conclusion Status of current theory Consequence The utility of attending to coincidences depends upon how much you know already

Two everyday inductive leaps Predicting the future Detecting coincidences

Subjective randomness View randomness as an inference about generating processes behind data Analysis similar (but inverse) to coincidences –randomness is evidence against a regular generating process (Griffiths & Tenenbaum, 2003)

AB Other cases of causal induction (Griffiths, Baraff, & Tenenbaum, 2004)

Aspects of language acquisition (Goldwater, Griffiths, & Johnson, 2006)

Categorization x Probability (Sanborn, Griffiths, & Navarro, 2006)

Conclusions We can learn about cognition (and not just perception) by thinking about optimal solutions to computational problems We can study induction using the inferences that people make every day Bayesian inference offers a way to understand these inductive inferences

Magic tricks Magic tricks are regularly used to identify infants’ ontological commitments Can we use a similar method with adults? (Wynn, 1992)

Ontological commitments (Keil, 1981)

What’s a better magic trick?

Participants rate the quality of 45 transformations, 10 appearances, and 10 disappearances –direction of transformation is randomized between subjects A second group rates similarity Objects are chosen to lie at different points in a hierarchy milk water a brick a vase a rose a daffodil a dove a blackbird a man a girl Applicable predicates

milk water a brick a vase a rose a daffodil a dove a blackbird a man a girl milk water a brick a vase a rose a daffodil a dove a blackbird a man a girl milk water a brick a vase a rose a daffodil a dove a blackbird a man a girl milk water a brick a vase a rose a daffodil a dove a blackbird a man a girl What’s a better magic trick?

Ontological asymmetries milk water a brick a vase a rose a daffodil a dove a blackbird a man a girl milk water a brick a vase a rose a daffodil a dove a blackbird a man a girl milk water a brick a vase a rose a daffodil a dove a blackbird a man a girl milk water a brick a vase a rose a daffodil a dove a blackbird a man a girl

Analyzing asymmetry Build a regression model: –similarity –appearing object –disappearing object –contains people –direction in hierarchy (-1,0,1) All factors significant Explains 90.9% of variance milk water a brick a vase a rose a daffodil a dove a blackbird a man a girl Applicable predicates

Summary: magic tricks Certain factors reliably influence the estimated quality of a magic trick Magic tricks might be a way to investigate our ontological assumptions –inviolable laws that are otherwise hard to assess A Bayesian theory of magic tricks? –strong evidence for a novel causal force –causal force is given low prior probability

A reformulation: unlikely kinds Coincidences are events of an unlikely kind –e.g. a sequence with that number of heads Deals with the obvious problem... p(10 heads) < p(5 heads, 5 tails)

Problems with unlikely kinds Defining kinds August 3, August 3, August 3, August 3 January 12, March 22, March 22, July 19, October 1, December 8

Problems with unlikely kinds Defining kinds Counterexamples P(4 heads) < P(2 heads, 2 tails) P(4 heads) > P(15 heads, 8 tails) HHHH > HHHHTHTTHHHTHTHHTHTTHHH HHHH > HHTT

Sampling from categories Frog distribution P(x|c)

Markov chain Monte Carlo Sample from a target distribution P(x) by constructing Markov chain for which P(x) is the stationary distribution Markov chain converges to its stationary distribution, providing outcomes that can be used similarly to samples

Metropolis-Hastings algorithm p(x)p(x)

p(x)p(x)

p(x)p(x)

A(x (t), x (t+1) ) = 0.5 p(x)p(x)

Metropolis-Hastings algorithm p(x)p(x)

A(x (t), x (t+1) ) = 1 p(x)p(x)

A task Ask subjects which of two alternatives comes from a target category Which animal is a frog?

Collecting the samples Which is the frog? Trial 1Trial 2Trial 3

Sampling from natural categories Examined distributions for four natural categories: giraffes, horses, cats, and dogs Presented stimuli with nine-parameter stick figures (Olman & Kersten, 2004)

Choice task

Samples from Subject 3 (projected onto a plane)

Mean animals by subject giraffe horse cat dog S1S2S3S4S5S6S7S8

Markov chain Monte Carlo with people Rational models can guide the design of psychological experiments Markov chain Monte Carlo (and other methods) can be used to sample from subjective probability distributions –category distributions –prior distributions