Prediction and Change Detection Mark Steyvers Scott Brown Mike Yi University of California, Irvine This work is supported by a grant from the US Air Force.

Slides:

Advertisements

Similar presentations

Bayes rule, priors and maximum a posteriori

Advertisements

Bayesian Belief Propagation

SUNY at Albany System Dynamics Colloquium, Spring 2008 Navid Ghaffarzadegan Effect of Conditional Feedback on Learning Navid Ghaffarzadegan PhD Student,

Lecture 18: Temporal-Difference Learning

David Rosen Goals  Overview of some of the big ideas in autonomous systems  Theme: Dynamical and stochastic systems lie at the intersection of mathematics.

CHAPTER 15: Tests of Significance: The Basics Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.

Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.

Dynamic Bayesian Networks (DBNs)

Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.

Session 7a. Decision Models -- Prof. Juran2 Overview Monte Carlo Simulation –Basic concepts and history Excel Tricks –RAND(), IF, Boolean Crystal Ball.

Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.

Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.

Hidden Markov Models in NLP

Advanced Artificial Intelligence

… Hidden Markov Models Markov assumption: Transition model:

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Change Detection in Dynamic Environments Mark Steyvers Scott Brown UC Irvine This work is supported by a grant from the US Air Force Office of Scientific.

Suppose we are interested in the digits in people’s phone numbers. There is some population mean (μ) and standard deviation (σ) Now suppose we take a sample.

A/Prof Geraint Lewis A/Prof Peter Tuthill

Machine Learning CMPT 726 Simon Fraser University

End of Chapter 8 Neil Weisenfeld March 28, 2005.

Inference in Dynamic Environments Mark Steyvers Scott Brown UC Irvine This work is supported by a grant from the US Air Force Office of Scientific Research.

Adaptive Signal Processing Class Project Adaptive Interacting Multiple Model Technique for Tracking Maneuvering Targets Viji Paul, Sahay Shishir Brijendra,

Theory of Decision Time Dynamics, with Applications to Memory.

Chapter 8 Prediction Algorithms for Smart Environments

Particle Filtering in Network Tomography

Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.

1 Lecture 19: Hypothesis Tests Devore, Ch Topics I.Statistical Hypotheses (pl!) –Null and Alternative Hypotheses –Testing statistics and rejection.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

Gibbs Sampler in Local Multiple Alignment Review by 온 정 헌.

Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.

Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.

Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.

Not in FPP Bayesian Statistics. The Frequentist paradigm Defines probability as a long-run frequency independent, identical trials Looks at parameters.

CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.

1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.

Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)

Geometric and Hyper-geometric Distribution. Geometric Random Variable  Take a fair coin and toss it as many times as needed until you observe a head.

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.

The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.

Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.

AP STATISTICS LESSON AP STATISTICS LESSON PROBABILITY MODELS.

Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:

1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.

Dynamics of Reward Bias Effects in Perceptual Decision Making Jay McClelland & Juan Gao Building on: Newsome and Rorie Holmes and Feng Usher and McClelland.

Prediction and Missing Data. Summarising Distributions ● Models are often large and complex ● Often only interested in some parameters – e.g. not so interested.

8.2 The Geometric Distribution 1.What is the geometric setting? 2.How do you calculate the probability of getting the first success on the n th trial?

Markov Chain Monte Carlo in R

Unit 5: Hypothesis Testing

Chapter Six Normal Curves and Sampling Probability Distributions

Data Mining Lecture 11.

CHAPTER 3: Bayesian Decision Theory

Introduction to particle filter

CSCI 5822 Probabilistic Models of Human and Machine Learning

CS 188: Artificial Intelligence Spring 2007

CSCI 5822 Probabilistic Models of Human and Machine Learning

Introduction to particle filter

Significance Tests: The Basics

CONTEXT DEPENDENT CLASSIFICATION

Significance Tests: The Basics

Shunan Zhang, Michael D. Lee, Miles Munro

Mathematical Foundations of BME Reza Shadmehr

Presentation transcript:

Prediction and Change Detection Mark Steyvers Scott Brown Mike Yi University of California, Irvine This work is supported by a grant from the US Air Force Office of Scientific Research (AFOSR grant number FA )

Perception of Random Sequences People perceive too much structure: –Coin tosses: Gambler’s fallacy –Sports scoring sequence: Hot hand belief Sequences are (mostly) stationary but people perceive non-stationarity  Bias to detect too much change?

Our Approach Non-stationary random sequences – changes in parameters over time. How well can people make inferences about underlying changes? How well can people make predictions about future outcomes? Compare data to: –Bayesian (ideal observer) models –Descriptive models

Two Tasks Inference task what caused the latest observation? Observed Data Internal State (Unobserved) Future Data Prediction task what is the next most likely outcome?

ABCDABCD trial A A A A A B B B D D D D D D A A Sequence Generation Start with one of four normal distributions Draw samples from this distribution With probability alpha, switch to a new generating distribution (uniformly chosen) Alpha determines number of change points change points

Tomato Cans Experiment Cans roll out of pipes A, B, C, or D Machine perturbs position of cans (normal noise) (real experiment has response buttons and is subject paced) ABCDABCD

Tomato Cans Experiment (real experiment has response buttons and is subject paced) ABCDABCD Cans roll out of pipes A, B, C, or D Machine perturbs position of cans (normal noise) Curtain obscures sequence of pipes

Tasks ABCDABCD Inference: what pipe produced the last can? A, B, C, or D? Prediction: in what region will the next can arrive? 1, 2, 3, or 4?

Experiment 1 63 subjects 12 blocks –6 blocks of 50 trials for inference task –6 blocks of 50 trials for prediction task –Identical trials for inference and prediction Alpha = 0.1

Accuracy vs. Number of Perceived Changes ideal INFERENCEPREDICTION ideal (Each dot is a subject)

INFERENCE PREDICTION Sequence ABCDABCD Trial

INFERENCE PREDICTION Sequence Ideal Observer ABCDABCD Trial

INFERENCE PREDICTION Sequence Ideal Observer Individual subjects Trial ABCDABCD

INFERENCE PREDICTION Sequence Ideal Observer Trial ABCDABCD Individual subjects

INFERENCE PREDICTION Sequence Ideal Observer Trial ABCDABCD Individual subjects

Exp. 1b Alpha =.08,.16, subjects Inference judgments only  Subjects track changes in alpha ideal

ClosedOpen Experiment 2: Plinko

(view full screen to see animation) Familiarization Trials Input pipe changes at each trial with prob. alpha

Observed Distributions Match Theory Note: mode of output distribution centers on input bin

(view full screen to see animation) Decision Phase Main phase of experiment uses closed device Inference task which input pipe was used, A, B, C, or D? Prediction task where will next ball arrive, A, B, C, or D?

Accuracy vs. Number of Perceived Changes INFERENCE PREDICTION 44 subjects

Main Finding Ideal observer: # changes in prediction = # changes in inference Subjects # changes in prediction >> # changes in inference Explanation?

Variability Matching Example output sequence: –ABAABC Strategy: match the observed variability in prediction sequence Suboptimal! Part of the variability is due to noise that is useless for prediction

Conclusion Subjects are able to track changes in dynamic decision environments Individual differences –Over-reaction: perceiving too much change –Under-reaction: perceiving too little change More over-reaction in prediction task

Do the experiments yourself:

LEFT OVER SLIDES

Digital Plinko – open curtain

Digital Plinko – closed curtain

Analogy to Hot Hand Belief Inference task: does a player have a hot hand? Prediction task: will a player make the next shot?

Process Model Memory buffer for K samples Calculate prob. of new sample under normal distribution of buffer If prob. < τ, –Assume a change –Flush the buffer –Put new sample in buffer Inference responses based on buffer mean Prediction responses are the same, except the model tries to anticipate changes by making a purely random response on some fraction X of trials model subject

Sweeping Alpha and Sigma in Bayesian Model INFERENCE PREDICTION

Optimal Prediction Strategy Best Prediction = Last Inference Subject: inference:AABBBD… prediction:…ABABDC… AABBBD…AABBBD… Using shifted inference judgments for prediction, 70% of subjects improve in prediction performance

Observed Data Internal State (Unobserved) Future Data Prediction Inference Locus of Gambler’s fallacy?

Generating Model... Change probability Change points Distribution parameters Observed data...

Bayesian Inference Given observed sequence y, what are the latent states z and change points x? Cannot calculate this complex posterior distribution. Use posterior simulation instead: MCMC with Gibbs sampling...

Gibbs Sampling Simulate the high-dimensional distribution by sampling on lower-dimensional subsets of variables where each subset is conditioned on the value of all others. The sampling is done sequentially and proceeds until the sampled values approximate the target distribution.... Use the subset {z t, x t, x t+1 } Why include x t+1 ? To preserve consistency. For example, suppose before sampling, z t+1 ≠ z t, and therefore x t+1 = 1. If the sample leads to z t = z t+1, then x t+1 needs to be updated.

Gibbs Sampling Assume α is a constant (for now) The set of variables {z t, x t, x t+1 } is conditionally dependent only on these variables: {y t, z t-1, z t+1 } Sample values {z t, x t, x t+1 } from this distribution:

Gibbs Sampling For tomato cans experiment For plinko experiment, look up from a table (P = number of input pipes)

Plinko as a Hidden Markov Model Time … Output pipe sequence StartEnd

Example comparing HMM Viterbi algorithm to Gibbs sampling algorithm